These exercises are taken from the grammar of graphics chapter from Modern Data Science with R: http://mdsr-book.github.io. Other materials relevant for instructors (sample activities, overview video) for this chapter can be found there.
Using the famous Galton data set from the mosaicData package:
height against their father’s heightsexHint: recall that you can find out more about the data set by running the command ?Galton.
SOLUTION:
library(mdsr)
glimpse(Galton)
## Observations: 898
## Variables: 6
## $ family <fct> 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5...
## $ father <dbl> 78.5, 78.5, 78.5, 78.5, 75.5, 75.5, 75.5, 75.5, 75.0, 7...
## $ mother <dbl> 67.0, 67.0, 67.0, 67.0, 66.5, 66.5, 66.5, 66.5, 64.0, 6...
## $ sex <fct> M, F, F, F, M, M, F, F, M, F, M, M, F, F, F, M, M, M, F...
## $ height <dbl> 73.2, 69.2, 69.0, 69.0, 73.5, 72.5, 65.5, 65.5, 71.0, 6...
## $ nkids <int> 4, 4, 4, 4, 4, 4, 4, 4, 2, 2, 5, 5, 5, 5, 5, 6, 6, 6, 6...
# solution goes here
Using the RailTrail data set from the mosaicData package:
volume against the high temperature that dayweekdaySOLUTION:
library(mdsr)
glimpse(RailTrail)
## Observations: 90
## Variables: 11
## $ hightemp <int> 83, 73, 74, 95, 44, 69, 66, 66, 80, 79, 78, 65, 41,...
## $ lowtemp <int> 50, 49, 52, 61, 52, 54, 39, 38, 55, 45, 55, 48, 49,...
## $ avgtemp <dbl> 66.5, 61.0, 63.0, 78.0, 48.0, 61.5, 52.5, 52.0, 67....
## $ spring <int> 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, ...
## $ summer <int> 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, ...
## $ fall <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, ...
## $ cloudcover <dbl> 7.6, 6.3, 7.5, 2.6, 10.0, 6.6, 2.4, 0.0, 3.8, 4.1, ...
## $ precip <dbl> 0.00, 0.29, 0.32, 0.00, 0.14, 0.02, 0.00, 0.00, 0.0...
## $ volume <int> 501, 419, 397, 385, 200, 375, 417, 629, 533, 547, 4...
## $ weekday <lgl> TRUE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, FALSE, F...
## $ dayType <chr> "weekday", "weekday", "weekday", "weekend", "weekda...
# solution goes here
Angelica Schuyler Church (https://en.wikipedia.org/wiki/Angelica_Schuyler_Church, 1756–1814) was the daughter of New York Governer Philip Schuyler and sister of Elizabeth Schuyler Hamilton. Angelica, New York was named after her. Generate a plot of the reported proportion of babies born with the name Angelica over time and interpret the figure.
SOLUTION:
library(mdsr)
library(babynames)
glimpse(babynames)
## Observations: 1,858,689
## Variables: 5
## $ year <dbl> 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 188...
## $ sex <chr> "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F...
## $ name <chr> "Mary", "Anna", "Emma", "Elizabeth", "Minnie", "Margaret"...
## $ n <int> 7065, 2604, 2003, 1939, 1746, 1578, 1472, 1414, 1320, 128...
## $ prop <dbl> 0.072384329, 0.026679234, 0.020521700, 0.019865989, 0.017...
# solution goes here
The following questions use the Marriage data set from the mosaicData package.
library(mdsr)
glimpse(Marriage)
## Observations: 98
## Variables: 15
## $ bookpageID <fct> B230p539, B230p677, B230p766, B230p892, B230p994...
## $ appdate <fct> 10/29/96, 11/12/96, 11/19/96, 12/2/96, 12/9/96, ...
## $ ceremonydate <fct> 11/9/96, 11/12/96, 11/27/96, 12/7/96, 12/14/96, ...
## $ delay <int> 11, 0, 8, 5, 5, 0, 16, 0, 28, 10, 8, 0, 4, 4, 0,...
## $ officialTitle <fct> CIRCUIT JUDGE , MARRIAGE OFFICIAL, MARRIAGE OFFI...
## $ person <fct> Groom, Groom, Groom, Groom, Groom, Groom, Groom,...
## $ dob <fct> 4/11/64, 8/6/64, 2/20/62, 5/20/56, 12/14/66, 2/2...
## $ age <dbl> 32.60274, 32.29041, 34.79178, 40.57808, 30.02192...
## $ race <fct> White, White, Hispanic, Black, White, White, Whi...
## $ prevcount <int> 0, 1, 1, 1, 0, 1, 1, 1, 0, 3, 1, 1, 0, 0, 1, 0, ...
## $ prevconc <fct> NA, Divorce, Divorce, Divorce, NA, NA, Divorce, ...
## $ hs <int> 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, ...
## $ college <int> 7, 0, 3, 4, 0, 0, 0, 0, 0, 6, 2, 1, 1, 0, 0, 4, ...
## $ dayOfBirth <dbl> 102.00, 219.00, 51.50, 141.00, 348.50, 52.50, 28...
## $ sign <fct> Aries, Leo, Pisces, Gemini, Saggitarius, Pisces,...
# solution goes here
The MLB_teams data set in the mdsr package contains information about Major League Baseball teams in the past four seasons. There are several quantitative and a few categorical variables present. See how many variables you can illustrate on a single plot in R. The current record is 7. (Note: This is not good graphical practice—it is merely an exercise to help you understand how to use visual cues and aesthetics!)
library(mdsr)
glimpse(MLB_teams)
## Observations: 210
## Variables: 11
## $ yearID <int> 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 200...
## $ teamID <chr> "ARI", "ATL", "BAL", "BOS", "CHA", "CHN", "CIN", "C...
## $ lgID <fct> NL, NL, AL, AL, AL, NL, NL, AL, NL, AL, NL, NL, AL,...
## $ W <int> 82, 72, 68, 95, 89, 97, 74, 81, 74, 74, 84, 86, 75,...
## $ L <int> 80, 90, 93, 67, 74, 64, 88, 81, 88, 88, 77, 75, 87,...
## $ WPct <dbl> 0.5061728, 0.4444444, 0.4223602, 0.5864198, 0.54601...
## $ attendance <int> 2509924, 2532834, 1950075, 3048250, 2500648, 330020...
## $ normAttend <dbl> 0.5838859, 0.5892155, 0.4536477, 0.7091172, 0.58172...
## $ payroll <int> 66202712, 102365683, 67196246, 133390035, 121189332...
## $ metroPop <dbl> 4489109, 5614323, 2785874, 4732161, 9554598, 955459...
## $ name <chr> "Arizona Diamondbacks", "Atlanta Braves", "Baltimor...
# solution goes here
Use the MLB_teams data in the mdsr package to create an informative data graphic that illustrates the relationship between winning percentage and payroll in context.
SOLUTION:
library(mdsr)
# solution goes here
Use the make_babynames_dist function in the mdsr package to recreate the “Deadest Names” graphic from FiveThirtyEight (http://tinyurl.com/zcbcl9o).
SOLUTION:
library(mdsr)
babynames_dist <- make_babynames_dist()
glimpse(babynames_dist)
## Observations: 1,639,490
## Variables: 9
## $ year <dbl> 1900, 1900, 1900, 1900, 1900, 1900, 1900, 1900...
## $ sex <chr> "F", "F", "F", "F", "F", "F", "F", "F", "F", "...
## $ name <chr> "Mary", "Helen", "Anna", "Margaret", "Ruth", "...
## $ n <int> 16707, 6343, 6114, 5304, 4765, 4096, 3920, 389...
## $ prop <dbl> 0.052574935, 0.019960664, 0.019240028, 0.01669...
## $ alive_prob <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ count_thousands <dbl> 16.707, 6.343, 6.114, 5.304, 4.765, 4.096, 3.9...
## $ age_today <dbl> 114, 114, 114, 114, 114, 114, 114, 114, 114, 1...
## $ est_alive_today <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
The macleish package contains weather data collected every ten minutes in 2015 from two weather stations in Whately, MA. Using ggpplot2 create a data graphic that displays the average temperature over each 10-minute interal (temperature as a function of time (when).
SOLUTION:
library(mdsr)
library(macleish)
## Loading required package: etl
glimpse(whately_2015)
## Observations: 52,560
## Variables: 8
## $ when <dttm> 2015-01-01 00:00:00, 2015-01-01 00:10:00, 201...
## $ temperature <dbl> -9.32, -9.46, -9.44, -9.30, -9.32, -9.34, -9.3...
## $ wind_speed <dbl> 1.399, 1.506, 1.620, 1.141, 1.223, 1.090, 1.16...
## $ wind_dir <dbl> 225.4, 248.2, 258.3, 243.8, 238.4, 241.7, 242....
## $ rel_humidity <dbl> 54.55, 55.38, 56.18, 56.41, 56.87, 57.25, 57.7...
## $ pressure <int> 985, 985, 985, 985, 984, 984, 984, 984, 984, 9...
## $ solar_radiation <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ rainfall <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
# solution goes here
Using data from the nasaweather package, create a scatterplot between wind and pressure with color being used to distinguish the type of storm.
SOLUTION:
library(mdsr)
library(nasaweather)
glimpse(storms)
## Observations: 2,747
## Variables: 11
## $ name <chr> "Allison", "Allison", "Allison", "Allison", "Allison"...
## $ year <int> 1995, 1995, 1995, 1995, 1995, 1995, 1995, 1995, 1995,...
## $ month <int> 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,...
## $ day <int> 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7,...
## $ hour <int> 0, 6, 12, 18, 0, 6, 12, 18, 0, 6, 12, 18, 0, 6, 12, 1...
## $ lat <dbl> 17.4, 18.3, 19.3, 20.6, 22.0, 23.3, 24.7, 26.2, 27.6,...
## $ long <dbl> -84.3, -84.9, -85.7, -85.8, -86.0, -86.3, -86.2, -86....
## $ pressure <int> 1005, 1004, 1003, 1001, 997, 995, 987, 988, 988, 990,...
## $ wind <int> 30, 30, 35, 40, 50, 60, 65, 65, 65, 60, 60, 45, 30, 3...
## $ type <chr> "Tropical Depression", "Tropical Depression", "Tropic...
## $ seasday <int> 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7,...
# solution goes here
Using data from the nasaweather package, use the geom_path function to plot the path of each tropical storm in the storms data table. Use color to distinguish the storms from one another, and use facetting to plot each year in its own panel.
SOLUTION:
library(mdsr)
library(nasaweather)
# solution goes here