Description
Problem 1: For this problem, we will work with the BA_degrees dataset. It contains the proportions of Bachelor’s degrees awarded in the US between 1970 and 2015.
BA_degrees <- read_csv(“https://wilkelab.org/SDS375/datasets/BA_degrees.csv”) BA_degrees
## # A tibble: 594 × 4
## field year count perc
## <chr> <dbl> <dbl> <dbl>
## 1 Agriculture and natural resources 1971 12672 0.0151
## 2 Architecture and related services 1971 5570 0.00663
## 3 Area, ethnic, cultural, gender, and group studies 1971 2579 0.00307
## 4 Biological and biomedical sciences 1971 35705 0.0425
## 5 Business 1971 115396 0.137
## 6 Communication, journalism, and related programs 1971 10324 0.0123
## 7 Communications technologies 1971 478 0.000569
## 8 Computer and information sciences 1971 2388 0.00284
## 9 Education 1971 176307 0.210 ## 10 Engineering 1971 45034 0.0536 ## # … with 584 more rows
From the entire dataset, select a subset of 6 fields of study, using arbitrary criteria. Plot a time series of the proportion of degrees (column perc ) in this field over time, using facets to show each field. Also plot a straight line fit to the data for each field. You should modify the order of facets to maximize figure appearance and memorability. What do you observe?
Hint: To get started, see slides 34 to 44 in the class on getting things into the right order:
https://wilkelab.org/DSC385/slides/getting-things-in-order.html#34 (https://wilkelab.org/DSC385/slides/gettingthings-in-order.html#34)
BA_degrees %>% filter(field %in% c(“Business”, “Engineering”, “Education”, “Psychology”, “Visual and performing arts”)) %>%
mutate(field = fct_reorder(field, perc, function(x) { min(x) – max(x) })) %>% ggplot(aes(year, perc)) + geom_line(color=’blue’) + geom_smooth(method = “lm”, se = FALSE, linetype=’dotted’, size=.5, color=’red’) + facet_wrap(~field)
Fields that have rapid growth: Business
Fields that are rapidly declining: Education
Fields that have stayed pretty much consistent: Engineering, Psychology, Visual and Performing Arts.
Problem 2: We will work the txhousing dataset provided by ggplot2. See here for details:
If you wanted to visualize the relative proportion of sales in these different cities, which plot would be most appropriate? A pie chart, a stacked bar chart, or side-by-side bars? Please explain your reasoning. You do not have to make the chart.
Answer: Side by side bars since there are a large number of subsets in the dataset.
Problem 3: Now make a pie chart of the txhousing_jan_2015 dataset, but show only the four cities with the most sales, plus all others lumped together into “Other”. (The code to prepare this lumped dataset has been provided for your convenience.) Make sure the pie slices are arranged in a reasonable order. Choose a reasonable color scale and a clean theme that avoids distracting visual elements.
Reviews
There are no reviews yet.