These exercises are taken from the grammar of graphics chapter from Modern Data Science with R: http://mdsr-book.github.io. Other materials relevant for instructors (sample activities, overview video) for this chapter can be found there.
Using the famous Galton
data set from the mosaicData
package:
height
against their father’s heightsex
Hint: recall that you can find out more about the data set by running the command ?Galton
.
SOLUTION:
library(mdsr)
glimpse(Galton)
## Observations: 898
## Variables: 6
## $ family <fct> 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5...
## $ father <dbl> 78.5, 78.5, 78.5, 78.5, 75.5, 75.5, 75.5, 75.5, 75.0, 7...
## $ mother <dbl> 67.0, 67.0, 67.0, 67.0, 66.5, 66.5, 66.5, 66.5, 64.0, 6...
## $ sex <fct> M, F, F, F, M, M, F, F, M, F, M, M, F, F, F, M, M, M, F...
## $ height <dbl> 73.2, 69.2, 69.0, 69.0, 73.5, 72.5, 65.5, 65.5, 71.0, 6...
## $ nkids <int> 4, 4, 4, 4, 4, 4, 4, 4, 2, 2, 5, 5, 5, 5, 5, 6, 6, 6, 6...
# solution goes here
Using the RailTrail
data set from the mosaicData
package:
volume
against the high temperature that dayweekday
SOLUTION:
library(mdsr)
glimpse(RailTrail)
## Observations: 90
## Variables: 11
## $ hightemp <int> 83, 73, 74, 95, 44, 69, 66, 66, 80, 79, 78, 65, 41,...
## $ lowtemp <int> 50, 49, 52, 61, 52, 54, 39, 38, 55, 45, 55, 48, 49,...
## $ avgtemp <dbl> 66.5, 61.0, 63.0, 78.0, 48.0, 61.5, 52.5, 52.0, 67....
## $ spring <int> 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, ...
## $ summer <int> 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, ...
## $ fall <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, ...
## $ cloudcover <dbl> 7.6, 6.3, 7.5, 2.6, 10.0, 6.6, 2.4, 0.0, 3.8, 4.1, ...
## $ precip <dbl> 0.00, 0.29, 0.32, 0.00, 0.14, 0.02, 0.00, 0.00, 0.0...
## $ volume <int> 501, 419, 397, 385, 200, 375, 417, 629, 533, 547, 4...
## $ weekday <lgl> TRUE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, FALSE, F...
## $ dayType <chr> "weekday", "weekday", "weekday", "weekend", "weekda...
# solution goes here
Angelica Schuyler Church (https://en.wikipedia.org/wiki/Angelica_Schuyler_Church, 1756–1814) was the daughter of New York Governer Philip Schuyler and sister of Elizabeth Schuyler Hamilton. Angelica, New York was named after her. Generate a plot of the reported proportion of babies born with the name Angelica over time and interpret the figure.
SOLUTION:
library(mdsr)
library(babynames)
glimpse(babynames)
## Observations: 1,858,689
## Variables: 5
## $ year <dbl> 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 188...
## $ sex <chr> "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F...
## $ name <chr> "Mary", "Anna", "Emma", "Elizabeth", "Minnie", "Margaret"...
## $ n <int> 7065, 2604, 2003, 1939, 1746, 1578, 1472, 1414, 1320, 128...
## $ prop <dbl> 0.072384329, 0.026679234, 0.020521700, 0.019865989, 0.017...
# solution goes here
The following questions use the Marriage
data set from the mosaicData
package.
library(mdsr)
glimpse(Marriage)
## Observations: 98
## Variables: 15
## $ bookpageID <fct> B230p539, B230p677, B230p766, B230p892, B230p994...
## $ appdate <fct> 10/29/96, 11/12/96, 11/19/96, 12/2/96, 12/9/96, ...
## $ ceremonydate <fct> 11/9/96, 11/12/96, 11/27/96, 12/7/96, 12/14/96, ...
## $ delay <int> 11, 0, 8, 5, 5, 0, 16, 0, 28, 10, 8, 0, 4, 4, 0,...
## $ officialTitle <fct> CIRCUIT JUDGE , MARRIAGE OFFICIAL, MARRIAGE OFFI...
## $ person <fct> Groom, Groom, Groom, Groom, Groom, Groom, Groom,...
## $ dob <fct> 4/11/64, 8/6/64, 2/20/62, 5/20/56, 12/14/66, 2/2...
## $ age <dbl> 32.60274, 32.29041, 34.79178, 40.57808, 30.02192...
## $ race <fct> White, White, Hispanic, Black, White, White, Whi...
## $ prevcount <int> 0, 1, 1, 1, 0, 1, 1, 1, 0, 3, 1, 1, 0, 0, 1, 0, ...
## $ prevconc <fct> NA, Divorce, Divorce, Divorce, NA, NA, Divorce, ...
## $ hs <int> 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, ...
## $ college <int> 7, 0, 3, 4, 0, 0, 0, 0, 0, 6, 2, 1, 1, 0, 0, 4, ...
## $ dayOfBirth <dbl> 102.00, 219.00, 51.50, 141.00, 348.50, 52.50, 28...
## $ sign <fct> Aries, Leo, Pisces, Gemini, Saggitarius, Pisces,...
# solution goes here
The MLB_teams
data set in the mdsr
package contains information about Major League Baseball teams in the past four seasons. There are several quantitative and a few categorical variables present. See how many variables you can illustrate on a single plot in R. The current record is 7. (Note: This is not good graphical practice—it is merely an exercise to help you understand how to use visual cues and aesthetics!)
library(mdsr)
glimpse(MLB_teams)
## Observations: 210
## Variables: 11
## $ yearID <int> 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 200...
## $ teamID <chr> "ARI", "ATL", "BAL", "BOS", "CHA", "CHN", "CIN", "C...
## $ lgID <fct> NL, NL, AL, AL, AL, NL, NL, AL, NL, AL, NL, NL, AL,...
## $ W <int> 82, 72, 68, 95, 89, 97, 74, 81, 74, 74, 84, 86, 75,...
## $ L <int> 80, 90, 93, 67, 74, 64, 88, 81, 88, 88, 77, 75, 87,...
## $ WPct <dbl> 0.5061728, 0.4444444, 0.4223602, 0.5864198, 0.54601...
## $ attendance <int> 2509924, 2532834, 1950075, 3048250, 2500648, 330020...
## $ normAttend <dbl> 0.5838859, 0.5892155, 0.4536477, 0.7091172, 0.58172...
## $ payroll <int> 66202712, 102365683, 67196246, 133390035, 121189332...
## $ metroPop <dbl> 4489109, 5614323, 2785874, 4732161, 9554598, 955459...
## $ name <chr> "Arizona Diamondbacks", "Atlanta Braves", "Baltimor...
# solution goes here
Use the MLB_teams
data in the mdsr
package to create an informative data graphic that illustrates the relationship between winning percentage and payroll in context.
SOLUTION:
library(mdsr)
# solution goes here
Use the make_babynames_dist
function in the mdsr
package to recreate the “Deadest Names” graphic from FiveThirtyEight (http://tinyurl.com/zcbcl9o).
SOLUTION:
library(mdsr)
babynames_dist <- make_babynames_dist()
glimpse(babynames_dist)
## Observations: 1,639,490
## Variables: 9
## $ year <dbl> 1900, 1900, 1900, 1900, 1900, 1900, 1900, 1900...
## $ sex <chr> "F", "F", "F", "F", "F", "F", "F", "F", "F", "...
## $ name <chr> "Mary", "Helen", "Anna", "Margaret", "Ruth", "...
## $ n <int> 16707, 6343, 6114, 5304, 4765, 4096, 3920, 389...
## $ prop <dbl> 0.052574935, 0.019960664, 0.019240028, 0.01669...
## $ alive_prob <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ count_thousands <dbl> 16.707, 6.343, 6.114, 5.304, 4.765, 4.096, 3.9...
## $ age_today <dbl> 114, 114, 114, 114, 114, 114, 114, 114, 114, 1...
## $ est_alive_today <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
The macleish
package contains weather data collected every ten minutes in 2015 from two weather stations in Whately, MA. Using ggpplot2
create a data graphic that displays the average temperature over each 10-minute interal (temperature
as a function of time (when
).
SOLUTION:
library(mdsr)
library(macleish)
## Loading required package: etl
glimpse(whately_2015)
## Observations: 52,560
## Variables: 8
## $ when <dttm> 2015-01-01 00:00:00, 2015-01-01 00:10:00, 201...
## $ temperature <dbl> -9.32, -9.46, -9.44, -9.30, -9.32, -9.34, -9.3...
## $ wind_speed <dbl> 1.399, 1.506, 1.620, 1.141, 1.223, 1.090, 1.16...
## $ wind_dir <dbl> 225.4, 248.2, 258.3, 243.8, 238.4, 241.7, 242....
## $ rel_humidity <dbl> 54.55, 55.38, 56.18, 56.41, 56.87, 57.25, 57.7...
## $ pressure <int> 985, 985, 985, 985, 984, 984, 984, 984, 984, 9...
## $ solar_radiation <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ rainfall <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
# solution goes here
Using data from the nasaweather
package, create a scatterplot between wind
and pressure
with color being used to distinguish the type
of storm.
SOLUTION:
library(mdsr)
library(nasaweather)
glimpse(storms)
## Observations: 2,747
## Variables: 11
## $ name <chr> "Allison", "Allison", "Allison", "Allison", "Allison"...
## $ year <int> 1995, 1995, 1995, 1995, 1995, 1995, 1995, 1995, 1995,...
## $ month <int> 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,...
## $ day <int> 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7,...
## $ hour <int> 0, 6, 12, 18, 0, 6, 12, 18, 0, 6, 12, 18, 0, 6, 12, 1...
## $ lat <dbl> 17.4, 18.3, 19.3, 20.6, 22.0, 23.3, 24.7, 26.2, 27.6,...
## $ long <dbl> -84.3, -84.9, -85.7, -85.8, -86.0, -86.3, -86.2, -86....
## $ pressure <int> 1005, 1004, 1003, 1001, 997, 995, 987, 988, 988, 990,...
## $ wind <int> 30, 30, 35, 40, 50, 60, 65, 65, 65, 60, 60, 45, 30, 3...
## $ type <chr> "Tropical Depression", "Tropical Depression", "Tropic...
## $ seasday <int> 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7,...
# solution goes here
Using data from the nasaweather
package, use the geom_path
function to plot the path of each tropical storm in the storms
data table. Use color to distinguish the storms from one another, and use facetting to plot each year
in its own panel.
SOLUTION:
library(mdsr)
library(nasaweather)
# solution goes here