B Introduction to R
and RStudio
This chapter provides a (brief) introduction to R and RStudio. The R language is a free, open-source software environment for statistical computing and graphics (Ihaka and Gentleman 1996; R Core Team 2020). RStudio is an open-source integrated development environment (IDE) for R that adds many features and productivity tools for R (RStudio 2020). This chapter includes a short history, installation information, a sample session, background on fundamental structures and actions, information about help and documentation, and other important topics.
The R Foundation for Statistical Computing holds and administers the copyright of the R software and documentation. R is available under the terms of the Free Software Foundation’s GNU General Public License in source code form.
RStudio facilitates use of R by integrating R help and documentation, providing a workspace browser and data viewer, and supporting syntax highlighting, code completion, and smart indentation. Support for reproducible analysis is made available with the knitr package and R Markdown (see Appendix D). It facilitates the creation of dynamic web applications using Shiny (see Chapter 14.4). It also provides support for multiple projects as well as an interface to source code control systems such as GitHub. It has become the default interface for many R users, and is our recommended environment for analysis.
RStudio is available as a client (standalone) for Windows, Mac OS X, and Linux. There is also a server version. Commercial products and support are available in addition to the open-source offerings (see http://www.rstudio.com/ide for details).
The first versions of R were written by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, while current development is coordinated by the R Development Core Team, a group of international volunteers.
The R language is quite similar to the S language, a flexible and extensible statistical environment originally developed in the 1980s at AT&T Bell Labs (now Alcatel–Lucent).
B.1 Installation
New users are encouraged to download and install R from the Comprehensive R Archive Network (CRAN, http://www.r-project.org) and install RStudio from http://www.rstudio.com/download. The sample session in the appendix of the Introduction to R documentation, also available from CRAN, is recommended reading.
The home page for the R project, located at http://r-project.org, is the best starting place for information about the software.
It includes links to CRAN, which features pre-compiled binaries as well as source code for R, add-on packages, documentation (including manuals,
frequently asked questions, and the R newsletter) as well as general background information.
Mirrored CRAN sites with identical copies of these files exist all around the world.
Updates to R and packages are regularly posted on CRAN.
B.1.1 RStudio
RStudio for Mac OS X, Windows, or Linux can be downloaded from https://rstudio.com/products/rstudio. RStudio requires R to be installed on the local machine. A server version (accessible from Web browsers) is also available for download. Documentation of the advanced features is available on the RStudio website.
B.2 Learning R
The R environment features extensive online documentation, though it can sometimes be challenging to comprehend. Each command has an associated help file that describes
usage, lists arguments, provides details of actions, gives references, lists other related functions, and includes examples of its use. The help system is invoked using either the ?
or help()
commands.
function
?help(function)
where function
is the name of the function of interest. (Alternatively, the Help
tab in RStudio can be used to access the help system.)
Some commands (e.g., if
) are reserved, so ?if
will not generate the desired documentation.
Running ?"if"
will work (see also ?Reserved
and ?Control
). Other reserved words include else
, repeat
, while
, function
, for
, in
, next
, break
, TRUE
, FALSE
, NULL
, Inf
, NaN
, and NA
.
The RSiteSearch()
function will search for key words or phrases in many places (including the search engine at http://search.r-project.org).
The RSeek.org site can also be helpful in finding more information and examples.
Examples of many functions are available using the example()
function.
example(mean)
Other useful resources are help.start()
, which provides a set of online manuals, and help.search()
, which can be used to look up entries by description. The apropos()
command returns any functions in the current search list that match a given pattern (which facilitates searching for a function based on what it does, as opposed to its name).
Other resources for help available from CRAN include the R help mailing list. The StackOverflow site for R provides a series of questions and answers for common questions that are tagged as being related to R. New users are also encouraged to read the R FAQ (frequently asked questions) list. RStudio provides a curated guide to resources for learning R and its extensions.
B.3 Fundamental structures and objects
Here we provide a brief introduction to R data structures.
B.3.1 Objects and vectors
Almost everything in R is an object, which may be initially confusing to a new user.
An object is simply something stored in R’s memory. Common objects include vectors, matrices, arrays, factors, data frames (akin to data sets in other systems), lists, and functions.
The basic variable structure is a vector.
Vectors (and other objects) are created using the <-
or =
assignment operators (which assign the evaluated expression on the right-hand side of the operator to the object name on the left-hand side).
<- c(5, 7, 9, 13, -4, 8) # preferred
x = c(5, 7, 9, 13, -4, 8) # equivalent x
The above code creates a vector of length 6 using the c()
function to concatenate scalars.
The =
operator is used in other contexts for the specification of arguments to functions. Other assignment operators exist, as well as the assign()
function (see help("<-")
for more information).
The exists()
function conveys whether an object exists in the workspace, and the rm()
command removes it.
In RStudio, the “Environment” tab shows the names (and values) of all objects that exist in the current workspace.
Since vector operations are so fundamental in R, it is important to be able to access (or index) elements within these vectors.
Many different ways of
indexing vectors are available. Here, we introduce several of these using the x
as created above. The command x[2]
returns the second element of x
(the scalar 7), and x[c(2, 4)]
returns the vector \((7, 13)\). The expressions x[c(TRUE, TRUE, TRUE, TRUE, TRUE, FALSE)]
, x[1:5]
and x[-6]
all return a vector consisting of the first 5 elements in x
(the last specifies all elements except the 6th).
2] x[
[1] 7
c(2, 4)] x[
[1] 7 13
c(TRUE, TRUE, TRUE, TRUE, TRUE, FALSE)] x[
[1] 5 7 9 13 -4
1:5] x[
[1] 5 7 9 13 -4
-6] x[
[1] 5 7 9 13 -4
Vectors are recycled if needed; for example, when comparing each of the elements of a vector to a scalar.
> 8 x
[1] FALSE FALSE TRUE TRUE FALSE FALSE
The above expression demonstrates the use of comparison operators (see ?Comparison
).
Only the third and fourth elements of x
are greater than 8.
The function returns a logical value of either TRUE
or FALSE
(see ?Logic
).
A count of elements meeting the condition can be generated using the sum()
function.
Other comparison operators include ==
(equal), >=
(greater than or equal), <=
(less than or equal and !=
(not equal).
Care needs to be taken in the comparison using ==
if noninteger values are present (see all.equal()
).
sum(x > 8)
[1] 2
B.3.2 Operators
There are many operators defined in R to carry out a variety of tasks.
Many of these were demonstrated in the sample session (assignment,
arithmetic) and previous examples (comparison).
Arithmetic operations include +
,
-
, *
, /
, ^
(exponentiation), %%
(modulus), and %/%
(integer division). More information about operators can be found using the help system (e.g., ?"+"
). Background information on
other operators and precedence rules can be found using help(Syntax)
.
Boolean operations (OR, AND, NOT, and XOR) are supported using the |
, ||
, &
, !
operators and the xor()
function.
The |
is an “or” operator that operates on each element of a vector,
while the ||
is another “or” operator that stops evaluation the first time that the result is true (see ?Logic
).
B.3.3 Lists
Lists in R are very general objects that can contain other objects of arbitrary types. List members can be named, or referenced using numeric indices (using the [[
operator).
<- list(first = "hello", second = 42, Bob = TRUE)
newlist is.list(newlist)
[1] TRUE
newlist
$first
[1] "hello"
$second
[1] 42
$Bob
[1] TRUE
2]] newlist[[
[1] 42
$Bob newlist
[1] TRUE
The unlist()
function flattens (makes a vector out of) the elements in a list (see also relist()
).
Note that unlisted objects are coerced to a common type (in this case character
).
<- unlist(newlist)
unlisted unlisted
first second Bob
"hello" "42" "TRUE"
B.3.4 Matrices
Matrices are like two-dimensional vectors: rectangular objects where all entries have the same type. We can create a \(2 \times 3\) matrix, display it, and test for its type.
<- matrix(x, 2, 3)
A A
[,1] [,2] [,3]
[1,] 5 9 -4
[2,] 7 13 8
is.matrix(A) # is A a matrix?
[1] TRUE
is.vector(A)
[1] FALSE
is.matrix(x)
[1] FALSE
Note that comments are supported within R (any input given after a #
character is ignored).
Indexing for matrices is done in a similar fashion as for vectors, albeit with a second dimension (denoted by a comma).
2, 3] A[
[1] 8
1] A[,
[1] 5 7
1, ] A[
[1] 5 9 -4
B.3.5 Dataframes and tibbles
Data sets are often stored in a data.frame
, which is a special type of list
that is more general than a matrix
.
This rectangular object, similar to a data table in other systems, can be thought of as a two-dimensional array with columns of vectors of the same length, but of possibly different types (as opposed to a matrix, which consists of vectors of the same type; or a list, whose elements needn’t be of the same length).
The function read_csv()
in the readr package returns a data.frame
object.
A simple
data.frame
can be created using the data.frame()
command.
Variables can be accessed using the $
operator, as shown below (see also help(Extract)
).
In addition, operations can be performed by column (e.g., calculation of sample statistics).
We can check to see if an object is a data.frame
with is.data.frame()
.
<- rep(11, length(x))
y y
[1] 11 11 11 11 11 11
<- data.frame(x, y)
ds ds
x y
1 5 11
2 7 11
3 9 11
4 13 11
5 -4 11
6 8 11
$x[3] ds
[1] 9
is.data.frame(ds)
[1] TRUE
Tibbles are a form of simple data frames (a modern interpretation) that are described as “lazy and surly” (https://tibble.tidyverse.org). They support multiple data technologies (e.g., SQL databases), make more explicit their assumptions, and have an enhanced print method (so that output doesn’t scroll so much). Many packages in the tidyverse create tibbles by default.
<- as_tibble(ds)
tbl is.data.frame(tbl)
[1] TRUE
is_tibble(ds)
[1] FALSE
is_tibble(tbl)
[1] TRUE
The use of data.frame()
differs from the use of cbind()
, which yields a matrix
object (unless it is given data frames as inputs).
<- cbind(x, y)
newmat newmat
x y
[1,] 5 11
[2,] 7 11
[3,] 9 11
[4,] 13 11
[5,] -4 11
[6,] 8 11
is.data.frame(newmat)
[1] FALSE
is.matrix(newmat)
[1] TRUE
Data frames are created from matrices using as.data.frame()
, while matrices
are constructed from data frames using as.matrix()
.
Although we strongly discourage its use, data frames can be attached to the workspace using the attach()
command.
The Tidyverse R Style guide (https://style.tidyverse.org) provides similar advice.
Name conflicts are a common problem with attach()
(see conflicts()
, which reports on objects that exist with the same name in two or more places on the search path).
The search()
function lists attached packages and objects.
To avoid cluttering and confusing the name-space, the command detach()
should be used once a data frame or package is no longer needed.
A number of R functions include a data
argument to specify a data frame as a local environment.
For functions without a data
option, the with()
and within()
commands can be used to simplify reference to an object within a data frame without attaching.
B.3.6 Attributes and classes
Many objects have a set of associated attributes (such as names of variables, dimensions, or classes) that can be displayed or sometimes changed. For example, we can find the dimension of the matrix defined earlier.
attributes(A)
$dim
[1] 2 3
Other types of objects within R include list
s (ordered objects that are not necessarily rectangular), regression models (objects of class lm
), and formulae (e.g., y ~ x1 + x2
). R supports object-oriented programming (see help(UseMethod)
).
As a result, objects in R have an associated class attribute, which
changes the default behavior for some operations on that object.
Many functions (called generics) have special capabilities when applied to objects of a particular class.
For example, when summary()
is applied to an lm
object, the summary.lm()
function is called.
Conversely, summary.aov()
is called when an aov
object is given as argument.
These class-specific implementations of generic functions are called methods.
The class()
function returns the classes to which an object belongs, while the methods()
function displays all of the classes supported by a generic function.
head(methods(summary))
[1] "summary,ANY-method" "summary,DBIObject-method"
[3] "summary,MySQLConnection-method" "summary,MySQLDriver-method"
[5] "summary,MySQLResult-method" "summary.aov"
Objects in R can belong to multiple classes, although those classes need not be nested. As noted above, generic functions are dispatched according the class attribute of each object.
Thus, in the example below we create the tbl
object, which belongs to multiple classes.
When the print()
function is called on tbl
, R looks for a method called print.tbl_df()
.
If no such method is found, R looks for a method called print.tbl()
.
If no such method is found, R looks for a method called print.data.frame()
. This process continues until a suitable method is found. If there is none, then print.default()
is called.
<- as_tibble(ds)
tbl class(tbl)
[1] "tbl_df" "tbl" "data.frame"
print(tbl)
# A tibble: 6 × 2
x y
<dbl> <dbl>
1 5 11
2 7 11
3 9 11
4 13 11
5 -4 11
6 8 11
print.data.frame(tbl)
x y
1 5 11
2 7 11
3 9 11
4 13 11
5 -4 11
6 8 11
print.default(tbl)
$x
[1] 5 7 9 13 -4 8
$y
[1] 11 11 11 11 11 11
attr(,"class")
[1] "tbl_df" "tbl" "data.frame"
There are a number of functions that assist with learning about an object in R. The attributes()
command displays the attributes associated with an object. The typeof()
function provides information about the underlying data structure of objects (e.g., logical, integer, double, complex, character, and list).
The str()
function displays the structure of an object, and the mode()
function displays its storage mode. For data frames, the glimpse()
function provides a useful summary of each variable.
A few quick notes on specific types of objects are worth relating here:
- A vector is a one-dimensional array of items of the same data type. There are six basic data types that a vector can contain:
logical
,character
,integer
,double
,complex
, andraw
. Vectors have alength()
but not adim()
. Vectors can have—but needn’t have—names()
. - A
factor
is a special type of vector for categorical data. A factor haslevel()
s. We change the reference level of a factor withrelevel()
. Factors are stored internally as integers that correspond to the id’s of the factor levels.
Factors can be problematic and their use is discouraged since they can complicate some aspects of data wrangling. A number of R developers have encouraged the use of the stringsAsFactors = FALSE
option.
- A
matrix
is a two-dimensional array of items of the same data type. A matrix has alength()
that is equal tonrow()
timesncol()
, or the product ofdim()
. - A
data.frame
is alist
of vectors of the same length. This is like a matrix, except that columns can be of different data types. Data frames always havenames()
and often haverow.names()
.
Do not confuse a factor
with a character
vector.
Note that data sets typically have class data.frame
but are of type list
. This is because, as noted above, R stores data frames as special types of lists—a list of several vectors having the same length, but possibly having different types.
class(mtcars)
[1] "data.frame"
typeof(mtcars)
[1] "list"
If you ever get confused when working with data frames and matrices, remember that a data.frame
is a list
(that can accommodate multiple types of objects), whereas a matrix
is more like a vector
(in that it can only support one type of object).
B.3.7 Options
The options()
function in R can be used to change various default behaviors. For example, the digits
argument controls the number of digits to display in output.
The current options are returned when options()
is called, to allow them to be restored. The command help(options)
lists all of the settable options.
B.3.8 Functions
Fundamental actions within R are carried out by calling functions (either built-in or user defined—see Appendix C for guidance on the latter). Multiple arguments may be given, separated by commas. The function carries out operations using the provided arguments and returns values (an object such as a vector or list) that are displayed (by default) or which can be saved by assignment to an object.
It’s a good idea to name arguments to functions. This practice minimizes errors assigning unnamed arguments to options and makes code more readable.
As an example, the quantile()
function takes a numeric vector and returns the minimum, 25th percentile,
median, 75th percentile, and maximum of the values in that vector.
However, if an optional vector of quantiles is given, those quantiles are calculated instead.
<- rnorm(1000) # generate 1000 standard normal random variables
vals quantile(vals)
0% 25% 50% 75% 100%
-3.520 -0.675 0.012 0.737 3.352
quantile(vals, c(.025, .975))
2.5% 97.5%
-2.00 1.98
# Return values can be saved for later use.
<- quantile(vals, c(.025, .975))
res 1] res[
2.5%
-2
Arguments (options) are available for most functions. The documentation specifies the default action if named arguments are not specified. If not named, the arguments are provided to the function in order specified in the function call.
For the quantile()
function, there is a type
argument that allows specification of one of nine algorithms for calculating quantiles.
<- quantile(vals, probs = c(.025, .975), type = 3)
res res
2.5% 97.5%
-2.02 1.98
Some functions allow a variable number of arguments.
An example is the
paste()
function.
The calling sequence is described in the documentation as follows.
paste(..., sep = " ", collapse = NULL)
To override the default behavior of a space being added between
elements output by paste()
, the user can specify a different
value for sep
.
B.4 Add-ons: Packages
B.4.1 Introduction to packages
Additional functionality in R is added through packages, which consist of
functions, data sets, examples, vignettes, and help files that can be downloaded from CRAN.
The function install.packages()
can be used to download and install packages.
Alternatively, RStudio provides an easy-to-use Packages
tab to install and load packages.
Throughout the book, we assume that the tidyverse and mdsr packages are loaded. In many cases, additional add-on packages (see Appendix A) need to be installed prior to running the examples in this book.
Packages that are not on CRAN can be installed using the install_github()
function in the remotes package.
install.packages("mdsr") # CRAN version
::install_github("mdsr-book/mdsr") # development version remotes
The library()
function will load an installed package.
For example, to install and load Frank Harrell’s Hmisc()
package, two commands are needed:
install.packages("Hmisc")
library(Hmisc)
If a package is not installed, running the library()
command will yield an
error.
Here we try to load the xaringanthemer package (which has not been installed):
> library(xaringanthemer)
in library(xaringanthemer) : there is no package called 'xaringanthemer' Error
To rectify the problem, we install the package from CRAN.
> install.packages("xaringanthemer")
'https://cloud.r-project.org/src/contrib/xaringanthemer_0.3.0.tar.gz'
trying URL 'application/x-gzip' length 1362643 bytes (1.3 MB)
Content type ==================================================
1.3 Mb downloaded
library(xaringanthemer)
The require()
function will test whether a package is available—this will load the library if it is installed, and generate a warning message if it is not (as opposed to library()
, which will return an error).
The names of all variables within a given data set (or more generally for sub-objects within an object) are provided by the names()
command.
The names of all objects defined within an R session can be generated using the objects()
and ls()
commands, which return a vector of character strings.
RStudio includes an Environment
tab that lists all the objects in the current environment.
The print()
and summary()
functions return the object or summaries of that object, respectively.
Running print(object)
at the command line is equivalent to just entering
the name of the object, i.e., object
.
B.4.2 Packages and name conflicts
Different package authors may choose the same name for functions that
exist within base R (or within other packages). This will cause the other
function or object to be masked. This can sometimes lead to confusion, when the expected version of a function is not the one that is called.
The find()
function can be used to determine where in the environment (workspace) a given object can be found.
find("mean")
[1] "package:base"
Sometimes it is desirable to remove a package from the workspace.
For example, a package might define a function with the same name as an existing function.
Packages can be detached using the syntax detach(package:PKGNAME)
,
where PKGNAME
is the name of the package.
Objects with the same name that appear in multiple places in the environment can be accessed using the location::objectname
syntax.
As an example, to access the mean()
function from the base package, the user would specify base::mean()
instead of mean()
.
It is sometimes preferable to reference a function or object in this way rather than loading the package.
As an example where this might be useful, there are functions in the
base and Hmisc packages called units()
. The find
command would display both (in the order in which they would be accessed).
library(Hmisc)
find("units")
[1] "package:Hmisc" "package:base"
When the Hmisc package is loaded, the units()
function from the base package is masked and would not be used by default.
To specify that the version of the function from the base package should be used,
prefix the function with the package name followed by two colons: base::units()
.
The conflicts()
function
reports on objects that exist with the same name in two or more places on the search path.
Running the command library(help = "PKGNAME")
will display information about an installed package.
Alternatively, the Packages
tab in RStudio can be used to list, install, and update packages.
The session_info()
function from the sessioninfo package provides improved reporting version information about R as well as details of loaded packages.
::session_info() sessioninfo
─ Session info ───────────────────────────────────────────────────────────
setting value
version R version 4.1.0 (2021-05-18)
os Ubuntu 20.04.2 LTS
system x86_64, linux-gnu
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/New_York
date 2021-07-28
─ Packages ───────────────────────────────────────────────────────────────
package * version date lib source
assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.1.0)
backports 1.2.1 2020-12-09 [1] CRAN (R 4.1.0)
base64enc 0.1-3 2015-07-28 [1] CRAN (R 4.1.0)
bookdown 0.22 2021-04-22 [1] CRAN (R 4.1.0)
broom 0.7.8 2021-06-24 [1] CRAN (R 4.1.0)
bslib 0.2.5.1 2021-05-18 [1] CRAN (R 4.1.0)
cellranger 1.1.0 2016-07-27 [1] CRAN (R 4.1.0)
checkmate 2.0.0 2020-02-06 [1] CRAN (R 4.1.0)
cli 3.0.1 2021-07-17 [1] CRAN (R 4.1.0)
cluster 2.1.2 2021-04-17 [4] CRAN (R 4.0.5)
colorspace 2.0-2 2021-06-24 [1] CRAN (R 4.1.0)
crayon 1.4.1 2021-02-08 [1] CRAN (R 4.1.0)
data.table 1.14.0 2021-02-21 [1] CRAN (R 4.1.0)
DBI * 1.1.1 2021-01-15 [1] CRAN (R 4.1.0)
dbplyr 2.1.1 2021-04-06 [1] CRAN (R 4.1.0)
digest 0.6.27 2020-10-24 [1] CRAN (R 4.1.0)
dplyr * 1.0.7 2021-06-18 [1] CRAN (R 4.1.0)
ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.0)
evaluate 0.14 2019-05-28 [1] CRAN (R 4.1.0)
fansi 0.5.0 2021-05-25 [1] CRAN (R 4.1.0)
forcats * 0.5.1 2021-01-27 [1] CRAN (R 4.1.0)
foreign 0.8-81 2020-12-22 [4] CRAN (R 4.0.3)
Formula * 1.2-4 2020-10-16 [1] CRAN (R 4.1.0)
fs 1.5.0 2020-07-31 [1] CRAN (R 4.1.0)
generics 0.1.0 2020-10-31 [1] CRAN (R 4.1.0)
ggplot2 * 3.3.5 2021-06-25 [1] CRAN (R 4.1.0)
glue 1.4.2 2020-08-27 [1] CRAN (R 4.1.0)
gridExtra 2.3 2017-09-09 [1] CRAN (R 4.1.0)
gtable 0.3.0 2019-03-25 [1] CRAN (R 4.1.0)
haven 2.4.1 2021-04-23 [1] CRAN (R 4.1.0)
Hmisc 4.5-0 2021-02-28 [1] CRAN (R 4.1.0)
hms 1.1.0 2021-05-17 [1] CRAN (R 4.1.0)
htmlTable 2.2.1 2021-05-18 [1] CRAN (R 4.1.0)
htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.1.0)
htmlwidgets 1.5.3 2020-12-10 [1] CRAN (R 4.1.0)
httr 1.4.2 2020-07-20 [1] CRAN (R 4.1.0)
jpeg 0.1-9 2021-07-24 [1] CRAN (R 4.1.0)
jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.1.0)
jsonlite 1.7.2 2020-12-09 [1] CRAN (R 4.1.0)
knitr 1.33 2021-04-24 [1] CRAN (R 4.1.0)
lattice * 0.20-44 2021-05-02 [4] CRAN (R 4.1.0)
latticeExtra 0.6-29 2019-12-19 [1] CRAN (R 4.1.0)
lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.1.0)
lubridate 1.7.10 2021-02-26 [1] CRAN (R 4.1.0)
magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.1.0)
Matrix 1.3-4 2021-06-01 [4] CRAN (R 4.1.0)
mdsr * 0.2.5 2021-03-29 [1] CRAN (R 4.1.0)
modelr 0.1.8 2020-05-19 [1] CRAN (R 4.1.0)
mosaicData * 0.20.2 2021-01-16 [1] CRAN (R 4.1.0)
munsell 0.5.0 2018-06-12 [1] CRAN (R 4.1.0)
nnet 7.3-16 2021-05-03 [4] CRAN (R 4.0.5)
pillar 1.6.1 2021-05-16 [1] CRAN (R 4.1.0)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.0)
png 0.1-7 2013-12-03 [1] CRAN (R 4.1.0)
purrr * 0.3.4 2020-04-17 [1] CRAN (R 4.1.0)
R6 2.5.0 2020-10-28 [1] CRAN (R 4.1.0)
RColorBrewer 1.1-2 2014-12-07 [1] CRAN (R 4.1.0)
Rcpp 1.0.7 2021-07-07 [1] CRAN (R 4.1.0)
readr * 2.0.0 2021-07-20 [1] CRAN (R 4.1.0)
readxl 1.3.1 2019-03-13 [1] CRAN (R 4.1.0)
repr 1.1.3 2021-01-21 [1] CRAN (R 4.1.0)
reprex 2.0.0 2021-04-02 [1] CRAN (R 4.1.0)
rlang 0.4.11 2021-04-30 [1] CRAN (R 4.1.0)
rmarkdown 2.9 2021-06-15 [1] CRAN (R 4.1.0)
RMySQL 0.10.22 2021-06-22 [1] CRAN (R 4.1.0)
rpart 4.1-15 2019-04-12 [4] CRAN (R 4.0.0)
rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.0)
rvest 1.0.1 2021-07-26 [1] CRAN (R 4.1.0)
sass 0.4.0 2021-05-12 [1] CRAN (R 4.1.0)
scales 1.1.1 2020-05-11 [1] CRAN (R 4.1.0)
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.1.0)
skimr 2.1.3 2021-03-07 [1] CRAN (R 4.1.0)
stringi 1.7.3 2021-07-16 [1] CRAN (R 4.1.0)
stringr * 1.4.0 2019-02-10 [1] CRAN (R 4.1.0)
survival * 3.2-11 2021-04-26 [4] CRAN (R 4.0.5)
tibble * 3.1.3 2021-07-23 [1] CRAN (R 4.1.0)
tidyr * 1.1.3 2021-03-03 [1] CRAN (R 4.1.0)
tidyselect 1.1.1 2021-04-30 [1] CRAN (R 4.1.0)
tidyverse * 1.3.1 2021-04-15 [1] CRAN (R 4.1.0)
tzdb 0.1.2 2021-07-20 [1] CRAN (R 4.1.0)
utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.0)
vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.0)
withr 2.4.2 2021-04-18 [1] CRAN (R 4.1.0)
xaringanthemer * 0.4.0 2021-06-24 [1] CRAN (R 4.1.0)
xfun 0.24 2021-06-15 [1] CRAN (R 4.1.0)
xml2 1.3.2 2020-04-23 [1] CRAN (R 4.1.0)
yaml 2.2.1 2020-02-01 [1] CRAN (R 4.1.0)
[1] /home/bbaumer/R/x86_64-pc-linux-gnu-library/4.1
[2] /usr/local/lib/R/site-library
[3] /usr/lib/R/site-library
[4] /usr/lib/R/library
The update.packages()
function should be run periodically to ensure that packages are up-to-date
As of December 2020, there were more than 16,800 packages available from CRAN. This represents a tremendous investment of time and code by many developers (Fox 2009). While each of these has met a minimal standard for inclusion, it is important to keep in mind that packages in R are created by individuals or small groups, and not endorsed by the R core group. As a result, they do not necessarily undergo the same level of testing and quality assurance that the core R system does.
B.4.3 CRAN task views
The “Task Views” on CRAN are a very useful resource for finding packages. These are curated listings of relevant packages within a particular application area (such as multivariate statistics, psychometrics, or survival analysis). Table B.1 displays the task views available as of July 2021.
Task View | Subject |
---|---|
Bayesian | Bayesian Inference |
ChemPhys | Chemometrics and Computational Physics |
ClinicalTrials | Clinical Trial Design, Monitoring, and Analysis |
Cluster | Cluster Analysis and Finite Mixture Models |
Databases | Databases with R |
DifferentialEquations | Differential Equations |
Distributions | Probability Distributions |
Econometrics | Econometrics |
Environmetrics | Analysis of Ecological and Environmental Data |
ExperimentalDesign | Design of Experiments (DoE) and Analysis of Experimental Data |
ExtremeValue | Extreme Value Analysis |
Finance | Empirical Finance |
FunctionalData | Functional Data Analysis |
Genetics | Statistical Genetics |
gR | gRaphical Models in R |
Graphics | Graphic Displays and Dynamic Graphics and Graphic Devices and Visualization |
HighPerformanceComputing | High-Performance and Parallel Computing with R |
Hydrology | Hydrological Data and Modeling |
MachineLearning | Machine Learning and Statistical Learning |
MedicalImaging | Medical Image Analysis |
MetaAnalysis | Meta-Analysis |
MissingData | Missing Data |
ModelDeployment | Model Deployment with R |
Multivariate | Multivariate Statistics |
NaturalLanguageProcessing | Natural Language Processing |
NumericalMathematics | Numerical Mathematics |
OfficialStatistics | Official Statistics and Survey Methodology |
Optimization | Optimization and Mathematical Programming |
Pharmacokinetics | Analysis of Pharmacokinetic Data |
Phylogenetics | Phylogenetics, Especially Comparative Methods |
Psychometrics | Psychometric Models and Methods |
ReproducibleResearch | Reproducible Research |
Robust | Robust Statistical Methods |
SocialSciences | Statistics for the Social Sciences |
Spatial | Analysis of Spatial Data |
SpatioTemporal | Handling and Analyzing Spatio-Temporal Data |
Survival | Survival Analysis |
TeachingStatistics | Teaching Statistics |
TimeSeries | Time Series Analysis |
Tracking | Processing and Analysis of Tracking Data |
WebTechnologies | Web Technologies and Services |
B.5 Further resources
Advanced R is an excellent source for learning more about how R works (H. Wickham 2019). Extensive resources and documentation about R can be found at the Comprehensive R Archive Network (CRAN).
The forcats package, included in the tidyverse, is designed to facilitate data wrangling with factors.
More information regarding tibbles can be found at https://tibble.tidyverse.org.
JupyterLab and JupyterHub are alternative environments that support analysis via sophisticated notebooks for multiple languages including Julia, Python, and R.
B.6 Exercises
Problem 1 (Easy): The following code chunk throws an error.
%>%
mtcars select(mpg, cyl)
mpg cyl
Mazda RX4 21.0 6
Mazda RX4 Wag 21.0 6
Datsun 710 22.8 4
Hornet 4 Drive 21.4 6
Hornet Sportabout 18.7 8
Valiant 18.1 6
Duster 360 14.3 8
Merc 240D 24.4 4
Merc 230 22.8 4
Merc 280 19.2 6
Merc 280C 17.8 6
Merc 450SE 16.4 8
Merc 450SL 17.3 8
Merc 450SLC 15.2 8
Cadillac Fleetwood 10.4 8
Lincoln Continental 10.4 8
Chrysler Imperial 14.7 8
Fiat 128 32.4 4
Honda Civic 30.4 4
Toyota Corolla 33.9 4
Toyota Corona 21.5 4
Dodge Challenger 15.5 8
AMC Javelin 15.2 8
Camaro Z28 13.3 8
Pontiac Firebird 19.2 8
Fiat X1-9 27.3 4
Porsche 914-2 26.0 4
Lotus Europa 30.4 4
Ford Pantera L 15.8 8
Ferrari Dino 19.7 6
Maserati Bora 15.0 8
Volvo 142E 21.4 4
What is the problem?
Problem 2 (Easy): Which of these kinds of names should be wrapped with quotation marks when used in R?
- function name
- file name
- the name of an argument in a named argument
- object name
Problem 3 (Easy): A user has typed the following commands into the RStudio console.
<- 2:10
obj1 <- c(2, 5)
obj2 <- c(TRUE, FALSE)
obj3 <- 42 obj4
What values are returned by the following commands?
* 10
obj1 2:4]
obj1[-3]
obj1[+ obj2
obj1 * obj3
obj1 + obj4
obj1 + obj3
obj2 sum(obj2)
sum(obj3)
Problem 4 (Easy): A user has typed the following commands into the RStudio console:
<- list(x1 = "sally", x2 = 42, x3 = FALSE, x4 = 1:5) mylist
What values do each of the following commands return?
is.list(mylist)
names(mylist)
length(mylist)
2]]
mylist[["x1"]]
mylist[[$x2
mylistlength(mylist[["x4"]])
class(mylist)
typeof(mylist)
class(mylist[[4]])
typeof(mylist[[3]])
Problem 5 (Easy): What’s wrong with this statement?
help(NHANES, package <- "NHANES")
Problem 6 (Easy): Consult the documentation for CPS85
in the mosaicData
package to determine the meaning of CPS.
Problem 7 (Easy): The following code chunk throws an error. Why?
library(tidyverse)
%>%
mtcars filter(cylinders == 4)
Error: Problem with `filter()` input `..1`.
ℹ Input `..1` is `cylinders == 4`.
x object 'cylinders' not found
What is the problem?
Problem 8 (Easy): The date
function returns an indication of the current time and date. What arguments does date
take? What kind of object is the result from date
? What kind of object is the result from Sys.time
?
Problem 9 (Easy): A user has typed the following commands into the RStudio console.
<- c(10, 15)
a <- c(TRUE, FALSE)
b <- c("happy", "sad") c
What do each of the following commands return? Describe the class of the object as well as its value.
data.frame(a, b, c)
cbind(a, b)
rbind(a, b)
cbind(a, b, c)
list(a, b, c)[[2]]
Problem 10 (Easy): For each of the following assignment statements, describe the error (or note why it does not generate an error).
<- sqrt 10
result1 <-- "Hello to you!"
result2 <- "Hello to you"
3result <- "Hello to you
result4 result5 <- date()
Problem 11 (Easy): The following code chunk throws an error.
library(tidyverse)
%>%
mtcars filter(cyl = 4)
Error: Problem with `filter()` input `..1`.
x Input `..1` is named.
ℹ This usually means that you've used `=` instead of `==`.
ℹ Did you mean `cyl == 4`?
The error suggests that you need to use ==
inside of filter()
. Why?
Problem 12 (Medium): The following code undertakes some data analysis using the HELP (Health Evaluation and Linkage to Primary Care) trial.
library(mosaic)
<-
ds read.csv("http://nhorton.people.amherst.edu/r2/datasets/helpmiss.csv")
summarise(group_by(
select(filter(mutate(ds,
sex = ifelse(female == 1, "F", "M")
!is.na(pcs)), age, pcs, sex),
),
sexmeanage = mean(age), meanpcs = mean(pcs), n = n()) ),
Describe in words what computations are being done. Using the pipe notation, translate this code into a more readable version.
Problem 13 (Medium): The following concepts should have some meaning to you: package, function, command, argument, assignment, object, object name, data frame, named argument, quoted character string.
Construct an example of R commands that make use of at least four of these. Label which part of your example R command corresponds to each.
B.7 Supplementary exercises
Available at https://mdsr-book.github.io/mdsr2e/appR.html#datavizI-online-exercises
No exercises found