# this command only needs to be run once
install.packages("mdsr")
# if you want the development version
::install_github("mdsr-book/mdsr") remotes
Appendix A — Packages used in the book
A.1 The mdsr package
The mdsr package contains many of the small data sets used in this book that are not available in other packages. To install it from CRAN, use install.packages()
. To get the latest release, use the install_github()
function from the remotes package. (See Section B.4.1 for more comprehensive information about R package maintenance.)
The list of data sets provided can be retrieved using the data()
function.
library(mdsr)
data(package = "mdsr")
The mdsr package includes some functions that simplify a number of tasks. In particular, the dbConnect_scidb()
function provides a shorthand for connecting to the public SQL server hosted by Amazon Web Services. We use this function extensively in Chapter 15 and in our classes and projects.
In keeping with best practices, mdsr no longer loads any other packages. In every chapter in this book, a call to library(tidyverse)
precedes a call to library(mdsr)
. These two steps will set up an R session to replicate the code in the book.
A.2 Other packages
As we discuss in Chapters 1 and 21, this book is not explicitly about “big data”—it is about mastering data science techniques for small and medium data with an eye towards big data. To that end, we need medium-sized data sets to work with. We have introduced several such data sets in this book, namely airlines, fec12, and fec16.
The airlines package, which was inspired by the nycflights13 package, gives R users the ability to download the full 33 years (and counting) of flight data from the United States Bureau of Transportation Statistics and bring it seamlessly into SQL without actually having to write any SQL code. The macleish package also uses the etl framework for hourly-updated weather data from the MacLeish field station.
The full list of packages used in this book appears below in Tables A.1 and A.2.
Package | Citation | Title |
---|---|---|
DBI | R Special Interest Group on Databases (R-SIG-DB), Wickham, and Müller (2024) | R Database Interface |
DT | Xie, Cheng, and Tan (2024) | A Wrapper of the JavaScript Library 'DataTables' |
Hmisc | Harrell (2024) | Harrell Miscellaneous |
Lahman | Friendly et al. (2023) | Sean 'Lahman' Baseball Database |
NHANES | Pruim (2015) | Data from the US National Health and Nutrition Examination Study |
NeuralNetTools | Beck (2022) | Visualization and Analysis Tools for Neural Networks |
RColorBrewer | Neuwirth (2022) | ColorBrewer Palettes |
RCurl | Temple Lang (2024) | General Network (HTTP/FTP/...) Client Interface for R |
RMariaDB | Müller et al. (2024) | Database Interface and MariaDB Driver |
Rcpp | Eddelbuettel et al. (2024) | Seamless R and C++ Integration |
aRxiv | Ram and Broman (2024) | Interface to the arXiv API |
alr4 | Weisberg (2018) | Data to Accompany Applied Linear Regression 4th Edition |
ape | Paradis et al. (2024) | Analyses of Phylogenetics and Evolution |
assertthat | Wickham (2019) | Easy Pre and Post Assertions |
available | Ganz et al. (2022) | Check if the Title of a Package is Available, Appropriate and Interesting |
babynames | Wickham (2021a) | US Baby Names 1880-2017 |
bench | Hester and Vaughan (2023) | High Precision Timing of R Expressions |
biglm | Lumley (2024) | Bounded Memory Linear and Generalized Linear Models |
bigrquery | Wickham and Bryan (2024) | An Interface to Google's 'BigQuery' 'API' |
broom | Robinson, Hayes, and Couch (2024) | Convert Statistical Objects into Tidy Tibbles |
dbplyr | Wickham, Girlich, and Ruiz (2024) | A 'dplyr' Back End for Databases |
discrim | Hvitfeldt and Kuhn (2023) | Model Wrappers for Discriminant Analysis |
dplyr | Wickham et al. (2023) | A Grammar of Data Manipulation |
dygraphs | Vanderkam et al. (2018) | Interface to 'Dygraphs' Interactive Time Series Charting Library |
etl | Baumer (2023) | Extract-Transform-Load Framework for Medium Data |
extrafont | Chang (2023) | Tools for Using Fonts |
forcats | Wickham (2023a) | Tools for Working with Categorical Variables (Factors) |
fs | Hester, Wickham, and Csárdi (2024) | Cross-Platform File System Operations Based on 'libuv' |
furrr | Vaughan and Dancho (2022) | Apply Mapping Functions in Parallel using Futures |
future | Bengtsson (2024) | Unified Parallel and Distributed Processing in R for Everyone |
gganimate | Pedersen and Robinson (2024) | A Grammar of Animated Graphics |
ggmosaic | Jeppson, Hofmann, and Cook (2021) | Mosaic Plots in the 'ggplot2' Framework |
ggplot2 | Wickham, Chang, et al. (2024) | Create Elegant Data Visualisations Using the Grammar of Graphics |
ggraph | Pedersen (2024a) | An Implementation of Grammar of Graphics for Graphs and Networks |
ggrepel | Slowikowski (2024) | Automatically Position Non-Overlapping Text Labels with 'ggplot2' |
ggspatial | Dunnington (2023) | Spatial Data Framework for ggplot2 |
ggthemes | Arnold (2024) | Extra Themes, Scales and Geoms for 'ggplot2' |
glmnet | Friedman et al. (2023) | Lasso and Elastic-Net Regularized Generalized Linear Models |
googlesheets4 | Bryan (2023) | Access Google Sheets using the Sheets API V4 |
haven | Wickham, Miller, and Smith (2023) | Import and Export 'SPSS', 'Stata' and 'SAS' Files |
here | Müller (2020) | A Simpler Way to Find Your Files |
htmlwidgets | Vaidyanathan et al. (2023) | HTML Widgets for R |
igraph | Csárdi, Nepusz, et al. (2024) | Network Analysis and Visualization |
janitor | Firke (2023) | Simple Tools for Examining and Cleaning Dirty Data |
jsonlite | Ooms (2023) | A Simple and Robust JSON Parser and Generator for R |
kableExtra | Zhu (2024) | Construct Complex Table with 'kable' and Pipe Syntax |
kknn | Schliep and Hechenbichler (2016) | Weighted k-Nearest Neighbors |
knitr | Xie (2024a) | A General-Purpose Package for Dynamic Report Generation in R |
lattice | Sarkar (2024) | Trellis Graphics for R |
leaflet | Cheng et al. (2024) | Create Interactive Web Maps with the JavaScript 'Leaflet' Library |
lubridate | Spinu, Grolemund, and Wickham (2023) | Make Dealing with Dates a Little Easier |
macleish | Baumer et al. (2022) | Retrieve Data from MacLeish Field Station |
magrittr | Bache and Wickham (2022) | A Forward-Pipe Operator for R |
mapproj | McIlroy et al. (2023) | Map Projections |
maps | Brownrigg (2023) | Draw Geographical Maps |
mclust | Fraley, Raftery, and Scrucca (2024) | Gaussian Mixture Modelling for Model-Based Clustering, Classification, and Density Estimation |
mdsr | Baumer, Horton, and Kaplan (2023) | Complement to 'Modern Data Science with R' |
modelr | Wickham (2023b) | Modelling Functions that Work with the Pipe |
mosaic | Pruim, Kaplan, and Horton (2024) | Project MOSAIC Statistics and Mathematics Teaching Utilities |
mosaicData | Pruim, Kaplan, and Horton (2023) | Project MOSAIC Data Sets |
nycflights13 | Wickham (2021b) | Flights that Departed NYC in 2013 |
parsnip | Kuhn and Vaughan (2024) | A Common API to Modeling and Analysis Functions |
partykit | Hothorn and Zeileis (2024) | A Toolkit for Recursive Partytioning |
patchwork | Pedersen (2024b) | The Composer of Plots |
plotly | Sievert et al. (2024) | Create Interactive Web Graphics via 'plotly.js' |
purrr | Wickham and Henry (2023) | Functional Programming Tools |
randomForest | Breiman et al. (2022) | Breiman and Cutler's Random Forests for Classification and Regression |
readr | Wickham, Hester, and Bryan (2024) | Read Rectangular Text Data |
readxl | Wickham and Bryan (2023) | Read Excel Files |
remotes | Csárdi, Hester, et al. (2024) | R Package Installation from Remote Repositories, Including 'GitHub' |
renv | Ushey and Wickham (2024) | Project Environments |
reticulate | Ushey, Allaire, and Tang (2024) | Interface to 'Python' |
rlang | Henry and Wickham (2024) | Functions for Base Types and Core R and 'Tidyverse' Features |
rmarkdown | Allaire et al. (2024) | Dynamic Documents for R |
rpart | Therneau and Atkinson (2023) | Recursive Partitioning and Regression Trees |
rvest | Wickham (2024) | Easily Harvest (Scrape) Web Pages |
scales | Wickham, Pedersen, and Seidel (2023) | Scale Functions for Visualization |
sessioninfo | Wickham et al. (2021) | R Session Information |
sf | Pebesma (2024) | Simple Features for R |
shiny | Chang et al. (2024) | Web Application Framework for R |
sp | Pebesma and Bivand (2024) | Classes and Methods for Spatial Data |
sparklyr | Luraschi et al. (2024) | R Interface to Apache Spark |
stopwords | Benoit, Muhr, and Watanabe (2021) | Multilingual Stopword Lists |
stringr | Wickham (2023c) | Simple, Consistent Wrappers for Common String Operations |
styler | Müller and Walthert (2024) | Non-Invasive Pretty Printing of R Code |
textdata | Hvitfeldt (2024) | Download and Load Various Text Datasets |
tidycensus | Walker and Herman (2024) | Load US Census Boundary and Attribute Data as 'tidyverse' and 'sf'-Ready Data Frames |
tidygeocoder | Cambon et al. (2021) | Geocoding Made Easy |
tidygraph | Pedersen (2024c) | A Tidy API for Graph Manipulation |
tidymodels | Kuhn and Wickham (2024) | Easily Install and Load the 'Tidymodels' Packages |
tidyr | Wickham, Vaughan, and Girlich (2024) | Tidy Messy Data |
tidytext | Robinson and Silge (2024) | Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools |
tidyverse | Wickham (2023d) | Easily Install and Load the 'Tidyverse' |
tigris | Walker (2024) | Load Census TIGER/Line Shapefiles |
tm | Feinerer and Hornik (2024) | Text Mining Package |
transformr | Pedersen (2024d) | Polygon and Path Transformations |
units | Pebesma et al. (2023) | Measurement Units for R Vectors |
usethis | Wickham, Bryan, et al. (2024) | Automate Package and Project Setup |
viridis | Garnier (2024) | Colorblind-Friendly Color Maps for R |
viridisLite | Garnier (2023) | Colorblind-Friendly Color Maps (Lite Version) |
wordcloud | Fellows (2018) | Word Clouds |
wru | Khanna et al. (2024) | Who are You? Bayesian Prediction of Racial Category Using Surname, First Name, Middle Name, and Geolocation |
xaringanthemer | Aden-Buie (2022) | Custom 'xaringan' CSS Themes |
xfun | Xie (2024b) | Supporting Functions for Packages Maintained by 'Yihui Xie' |
xkcd | Torres-Manzanera (2018) | Plotting ggplot2 Graphics in an XKCD Style |
yardstick | Kuhn, Vaughan, and Hvitfeldt (2024) | Tidy Characterizations of Model Performance |
Package | GitHub User | Citation | Title |
---|---|---|---|
etude | dtkaplan | @R-etude | Utilities for Handling Textbook Exercises with Knitr |
fec12 | baumer-lab | @R-fec12 | Data Package for 2012 Federal Elections |
openrouteservice | GIScience | @R-openrouteservice | Openrouteservice API Client |
streamgraph | hrbrmstr | @R-streamgraph | Build Streamgraph Visualizations |
A.3 Further resources
More information on the mdsr package can be found at http://www.github.com/mdsr-book/mdsr.