B Introduction to
R and RStudio
This chapter provides a (brief) introduction to R and RStudio. The R language is a free, open-source software environment for statistical computing and graphics (Ihaka and Gentleman 1996; R Core Team 2020). RStudio is an open-source integrated development environment (IDE) for R that adds many features and productivity tools for R (RStudio 2020). This chapter includes a short history, installation information, a sample session, background on fundamental structures and actions, information about help and documentation, and other important topics.
The R Foundation for Statistical Computing holds and administers the copyright of the R software and documentation. R is available under the terms of the Free Software Foundation’s GNU General Public License in source code form.
RStudio facilitates use of R by integrating R help and documentation, providing a workspace browser and data viewer, and supporting syntax highlighting, code completion, and smart indentation. Support for reproducible analysis is made available with the knitr package and R Markdown (see Appendix D). It facilitates the creation of dynamic web applications using Shiny (see Chapter 14.4). It also provides support for multiple projects as well as an interface to source code control systems such as GitHub. It has become the default interface for many R users, and is our recommended environment for analysis.
RStudio is available as a client (standalone) for Windows, Mac OS X, and Linux. There is also a server version. Commercial products and support are available in addition to the open-source offerings (see http://www.rstudio.com/ide for details).
The first versions of R were written by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, while current development is coordinated by the R Development Core Team, a group of international volunteers.
The R language is quite similar to the S language, a flexible and extensible statistical environment originally developed in the 1980s at AT&T Bell Labs (now Alcatel–Lucent).
New users are encouraged to download and install R from the Comprehensive R Archive Network (CRAN, http://www.r-project.org) and install RStudio from http://www.rstudio.com/download. The sample session in the appendix of the Introduction to R documentation, also available from CRAN, is recommended reading.
The home page for the R project, located at http://r-project.org, is the best starting place for information about the software.
It includes links to CRAN, which features pre-compiled binaries as well as source code for R, add-on packages, documentation (including manuals,
frequently asked questions, and the R newsletter) as well as general background information.
Mirrored CRAN sites with identical copies of these files exist all around the world.
Updates to R and packages are regularly posted on CRAN.
RStudio for Mac OS X, Windows, or Linux can be downloaded from https://rstudio.com/products/rstudio. RStudio requires R to be installed on the local machine. A server version (accessible from Web browsers) is also available for download. Documentation of the advanced features is available on the RStudio website.
B.2 Learning R
The R environment features extensive online documentation, though it can sometimes be challenging to comprehend. Each command has an associated help file that describes
usage, lists arguments, provides details of actions, gives references, lists other related functions, and includes examples of its use. The help system is invoked using either the
function is the name of the function of interest. (Alternatively, the
Help tab in RStudio can be used to access the help system.)
Some commands (e.g.,
if) are reserved, so
?if will not generate the desired documentation.
?"if" will work (see also
?Control). Other reserved words include
RSiteSearch() function will search for key words or phrases in many places (including the search engine at http://search.r-project.org).
The RSeek.org site can also be helpful in finding more information and examples.
Examples of many functions are available using the
Other useful resources are
help.start(), which provides a set of online manuals, and
help.search(), which can be used to look up entries by description. The
apropos() command returns any functions in the current search list that match a given pattern (which facilitates searching for a function based on what it does, as opposed to its name).
Other resources for help available from CRAN include the R help mailing list. The StackOverflow site for R provides a series of questions and answers for common questions that are tagged as being related to R. New users are also encouraged to read the R FAQ (frequently asked questions) list. RStudio provides a curated guide to resources for learning R and its extensions.
B.3 Fundamental structures and objects
Here we provide a brief introduction to R data structures.
B.3.1 Objects and vectors
Almost everything in R is an object, which may be initially confusing to a new user.
An object is simply something stored in R’s memory. Common objects include vectors, matrices, arrays, factors, data frames (akin to data sets in other systems), lists, and functions.
The basic variable structure is a vector.
Vectors (and other objects) are created using the
= assignment operators (which assign the evaluated expression on the right-hand side of the operator to the object name on the left-hand side).
<- c(5, 7, 9, 13, -4, 8) # preferred x = c(5, 7, 9, 13, -4, 8) # equivalentx
The above code creates a vector of length 6 using the
c() function to concatenate scalars.
= operator is used in other contexts for the specification of arguments to functions. Other assignment operators exist, as well as the
assign() function (see
help("<-") for more information).
exists() function conveys whether an object exists in the workspace, and the
rm() command removes it.
In RStudio, the “Environment” tab shows the names (and values) of all objects that exist in the current workspace.
Since vector operations are so fundamental in R, it is important to be able to access (or index) elements within these vectors.
Many different ways of
indexing vectors are available. Here, we introduce several of these using the
x as created above. The command
x returns the second element of
x (the scalar 7), and
x[c(2, 4)] returns the vector \((7, 13)\). The expressions
x[c(TRUE, TRUE, TRUE, TRUE, TRUE, FALSE)],
x[-6] all return a vector consisting of the first 5 elements in
x (the last specifies all elements except the 6th).
 7 13
c(TRUE, TRUE, TRUE, TRUE, TRUE, FALSE)]x[
 5 7 9 13 -4
 5 7 9 13 -4
 5 7 9 13 -4
Vectors are recycled if needed; for example, when comparing each of the elements of a vector to a scalar.
 FALSE FALSE TRUE TRUE FALSE FALSE
The above expression demonstrates the use of comparison operators (see
Only the third and fourth elements of
x are greater than 8.
The function returns a logical value of either
A count of elements meeting the condition can be generated using the
Other comparison operators include
>= (greater than or equal),
<= (less than or equal and
!= (not equal).
Care needs to be taken in the comparison using
== if noninteger values are present (see
sum(x > 8)
There are many operators defined in R to carry out a variety of tasks.
Many of these were demonstrated in the sample session (assignment,
arithmetic) and previous examples (comparison).
Arithmetic operations include
%% (modulus), and
%/% (integer division). More information about operators can be found using the help system (e.g.,
?"+"). Background information on
other operators and precedence rules can be found using
Boolean operations (OR, AND, NOT, and XOR) are supported using the
! operators and the
| is an “or” operator that operates on each element of a vector,
|| is another “or” operator that stops evaluation the first time that the result is true (see
Lists in R are very general objects that can contain other objects of arbitrary types. List members can be named, or referenced using numeric indices (using the
<- list(first = "hello", second = 42, Bob = TRUE) newlist is.list(newlist)
$first  "hello" $second  42 $Bob  TRUE
unlist() function flattens (makes a vector out of) the elements in a list (see also
Note that unlisted objects are coerced to a common type (in this case
<- unlist(newlist) unlisted unlisted
first second Bob "hello" "42" "TRUE"
Matrices are like two-dimensional vectors: rectangular objects where all entries have the same type. We can create a \(2 \times 3\) matrix, display it, and test for its type.
<- matrix(x, 2, 3) A A
[,1] [,2] [,3] [1,] 5 9 -4 [2,] 7 13 8
is.matrix(A) # is A a matrix?
Note that comments are supported within R (any input given after a
# character is ignored).
Indexing for matrices is done in a similar fashion as for vectors, albeit with a second dimension (denoted by a comma).
 5 7
 5 9 -4
B.3.5 Dataframes and tibbles
Data sets are often stored in a
data.frame, which is a special type of
list that is more general than a
This rectangular object, similar to a data table in other systems, can be thought of as a two-dimensional array with columns of vectors of the same length, but of possibly different types (as opposed to a matrix, which consists of vectors of the same type; or a list, whose elements needn’t be of the same length).
read_csv() in the readr package returns a
can be created using the
Variables can be accessed using the
$ operator, as shown below (see also
In addition, operations can be performed by column (e.g., calculation of sample statistics).
We can check to see if an object is a
<- rep(11, length(x)) y y
 11 11 11 11 11 11
<- data.frame(x, y) ds ds
x y 1 5 11 2 7 11 3 9 11 4 13 11 5 -4 11 6 8 11
Tibbles are a form of simple data frames (a modern interpretation) that are described as “lazy and surly” (https://tibble.tidyverse.org). They support multiple data technologies (e.g., SQL databases), make more explicit their assumptions, and have an enhanced print method (so that output doesn’t scroll so much). Many packages in the tidyverse create tibbles by default.
<- as_tibble(ds) tbl is.data.frame(tbl)
The use of
data.frame() differs from the use of
cbind(), which yields a
matrix object (unless it is given data frames as inputs).
<- cbind(x, y) newmat newmat
x y [1,] 5 11 [2,] 7 11 [3,] 9 11 [4,] 13 11 [5,] -4 11 [6,] 8 11
Data frames are created from matrices using
as.data.frame(), while matrices
are constructed from data frames using
Although we strongly discourage its use, data frames can be attached to the workspace using the
The Tidyverse R Style guide (https://style.tidyverse.org) provides similar advice.
Name conflicts are a common problem with
conflicts(), which reports on objects that exist with the same name in two or more places on the search path).
search() function lists attached packages and objects.
To avoid cluttering and confusing the name-space, the command
detach() should be used once a data frame or package is no longer needed.
A number of R functions include a
data argument to specify a data frame as a local environment.
For functions without a
data option, the
within() commands can be used to simplify reference to an object within a data frame without attaching.
B.3.6 Attributes and classes
Many objects have a set of associated attributes (such as names of variables, dimensions, or classes) that can be displayed or sometimes changed. For example, we can find the dimension of the matrix defined earlier.
$dim  2 3
Other types of objects within R include
lists (ordered objects that are not necessarily rectangular), regression models (objects of class
lm), and formulae (e.g.,
y ~ x1 + x2). R supports object-oriented programming (see
As a result, objects in R have an associated class attribute, which
changes the default behavior for some operations on that object.
Many functions (called generics) have special capabilities when applied to objects of a particular class.
For example, when
summary() is applied to an
lm object, the
summary.lm() function is called.
summary.aov() is called when an
aov object is given as argument.
These class-specific implementations of generic functions are called methods.
class() function returns the classes to which an object belongs, while the
methods() function displays all of the classes supported by a generic function.
 "summary,ANY-method" "summary,DBIObject-method"  "summary,MySQLConnection-method" "summary,MySQLDriver-method"  "summary,MySQLResult-method" "summary.aov"
Objects in R can belong to multiple classes, although those classes need not be nested. As noted above, generic functions are dispatched according the class attribute of each object.
Thus, in the example below we create the
tbl object, which belongs to multiple classes.
print() function is called on
tbl, R looks for a method called
If no such method is found, R looks for a method called
If no such method is found, R looks for a method called
print.data.frame(). This process continues until a suitable method is found. If there is none, then
print.default() is called.
<- as_tibble(ds) tbl class(tbl)
 "tbl_df" "tbl" "data.frame"
# A tibble: 6 x 2 x y <dbl> <dbl> 1 5 11 2 7 11 3 9 11 4 13 11 5 -4 11 6 8 11
x y 1 5 11 2 7 11 3 9 11 4 13 11 5 -4 11 6 8 11
$x  5 7 9 13 -4 8 $y  11 11 11 11 11 11 attr(,"class")  "tbl_df" "tbl" "data.frame"
There are a number of functions that assist with learning about an object in R. The
attributes() command displays the attributes associated with an object. The
typeof() function provides information about the underlying data structure of objects (e.g., logical, integer, double, complex, character, and list).
str() function displays the structure of an object, and the
mode() function displays its storage mode. For data frames, the
glimpse() function provides a useful summary of each variable.
A few quick notes on specific types of objects are worth relating here:
- A vector is a one-dimensional array of items of the same data type. There are six basic data types that a vector can contain:
raw. Vectors have a
length()but not a
dim(). Vectors can have—but needn’t have—
factoris a special type of vector for categorical data. A factor has
level()s. We change the reference level of a factor with
relevel(). Factors are stored internally as integers that correspond to the id’s of the factor levels.
Factors can be problematic and their use is discouraged since they can complicate some aspects of data wrangling. A number of R developers have encouraged the use of the
stringsAsFactors = FALSE option.
matrixis a two-dimensional array of items of the same data type. A matrix has a
length()that is equal to
ncol(), or the product of
listof vectors of the same length. This is like a matrix, except that columns can be of different data types. Data frames always have
names()and often have
Do not confuse a
factor with a
Note that data sets typically have class
data.frame but are of type
list. This is because, as noted above, R stores data frames as special types of lists—a list of several vectors having the same length, but possibly having different types.
If you ever get confused when working with data frames and matrices, remember that a
data.frame is a
list (that can accommodate multiple types of objects), whereas a
matrix is more like a
vector (in that it can only support one type of object).
options() function in R can be used to change various default behaviors. For example, the
digits argument controls the number of digits to display in output.
The current options are returned when
options() is called, to allow them to be restored. The command
help(options) lists all of the settable options.
Fundamental actions within R are carried out by calling functions (either built-in or user defined—see Appendix C for guidance on the latter). Multiple arguments may be given, separated by commas. The function carries out operations using the provided arguments and returns values (an object such as a vector or list) that are displayed (by default) or which can be saved by assignment to an object.
It’s a good idea to name arguments to functions. This practice minimizes errors assigning unnamed arguments to options and makes code more readable.
As an example, the
quantile() function takes a numeric vector and returns the minimum, 25th percentile,
median, 75th percentile, and maximum of the values in that vector.
However, if an optional vector of quantiles is given, those quantiles are calculated instead.
<- rnorm(1000) # generate 1000 standard normal random variables vals quantile(vals)
0% 25% 50% 75% 100% -3.7146 -0.6638 0.0359 0.7047 3.6276
quantile(vals, c(.025, .975))
2.5% 97.5% -1.97 1.83
# Return values can be saved for later use. <- quantile(vals, c(.025, .975)) res 1]res[
Arguments (options) are available for most functions. The documentation specifies the default action if named arguments are not specified. If not named, the arguments are provided to the function in order specified in the function call.
quantile() function, there is a
type argument that allows specification of one of nine algorithms for calculating quantiles.
<- quantile(vals, probs = c(.025, .975), type = 3) res res
2.5% 97.5% -1.98 1.83
Some functions allow a variable number of arguments.
An example is the
The calling sequence is described in the documentation as follows.
paste(..., sep = " ", collapse = NULL)
To override the default behavior of a space being added between
elements output by
paste(), the user can specify a different
B.4 Add-ons: Packages
B.4.1 Introduction to packages
Additional functionality in R is added through packages, which consist of
functions, data sets, examples, vignettes, and help files that can be downloaded from CRAN.
install.packages() can be used to download and install packages.
Alternatively, RStudio provides an easy-to-use
Packages tab to install and load packages.
Throughout the book, we assume that the tidyverse and mdsr packages are loaded. In many cases, additional add-on packages (see Appendix A) need to be installed prior to running the examples in this book.
Packages that are not on CRAN can be installed using the
install_github() function in the remotes package.
install.packages("mdsr") # CRAN version ::install_github("mdsr-book/mdsr") # development versionremotes
library() function will load an installed package.
For example, to install and load Frank Harrell’s
Hmisc() package, two commands are needed:
If a package is not installed, running the
library() command will yield an
Here we try to load the xaringanthemer package (which has not been installed):
> library(xaringanthemer) in library(xaringanthemer) : there is no package called 'xaringanthemer'Error
To rectify the problem, we install the package from CRAN.
> install.packages("xaringanthemer") 'https://cloud.r-project.org/src/contrib/xaringanthemer_0.3.0.tar.gz' trying URL 'application/x-gzip' length 1362643 bytes (1.3 MB) Content type ================================================== 1.3 Mbdownloaded
require() function will test whether a package is available—this will load the library if it is installed, and generate a warning message if it is not (as opposed to
library(), which will return an error).
The names of all variables within a given data set (or more generally for sub-objects within an object) are provided by the
The names of all objects defined within an R session can be generated using the
ls() commands, which return a vector of character strings.
RStudio includes an
Environment tab that lists all the objects in the current environment.
summary() functions return the object or summaries of that object, respectively.
print(object) at the command line is equivalent to just entering
the name of the object, i.e.,
B.4.2 Packages and name conflicts
Different package authors may choose the same name for functions that
exist within base R (or within other packages). This will cause the other
function or object to be masked. This can sometimes lead to confusion, when the expected version of a function is not the one that is called.
find() function can be used to determine where in the environment (workspace) a given object can be found.
Sometimes it is desirable to remove a package from the workspace.
For example, a package might define a function with the same name as an existing function.
Packages can be detached using the syntax
PKGNAME is the name of the package.
Objects with the same name that appear in multiple places in the environment can be accessed using the
As an example, to access the
mean() function from the base package, the user would specify
base::mean() instead of
It is sometimes preferable to reference a function or object in this way rather than loading the package.
As an example where this might be useful, there are functions in the
base and Hmisc packages called
units(). The find
command would display both (in the order in which they would be accessed).
 "package:Hmisc" "package:base"
When the Hmisc package is loaded, the
units() function from the base package is masked and would not be used by default.
To specify that the version of the function from the base package should be used,
prefix the function with the package name followed by two colons:
reports on objects that exist with the same name in two or more places on the search path.
Running the command
library(help = "PKGNAME")
will display information about an installed package.
Packages tab in RStudio can be used to list, install, and update packages.
session_info() function from the sessioninfo package provides improved reporting version information about R as well as details of loaded packages.
─ Session info ─────────────────────────────────────────────────────────── setting value version R version 4.0.3 (2020-10-10) os Ubuntu 18.04.5 LTS system x86_64, linux-gnu ui X11 language (EN) collate en_US.UTF-8 ctype en_US.UTF-8 tz America/New_York date 2021-01-10 ─ Packages ─────────────────────────────────────────────────────────────── package * version date lib source assertthat 0.2.1 2019-03-21  CRAN (R 4.0.2) backports 1.2.1 2020-12-09  CRAN (R 4.0.3) base64enc 0.1-3 2015-07-28  CRAN (R 4.0.2) bookdown 0.21 2020-10-13  CRAN (R 4.0.2) broom 0.7.3 2020-12-16  CRAN (R 4.0.3) cellranger 1.1.0 2016-07-27  CRAN (R 4.0.2) checkmate 2.0.0 2020-02-06  CRAN (R 4.0.2) cli 2.2.0 2020-11-20  CRAN (R 4.0.3) cluster 2.1.0 2019-06-19  CRAN (R 4.0.2) colorspace 2.0-0 2020-11-11  CRAN (R 4.0.3) crayon 1.3.4 2017-09-16  CRAN (R 4.0.2) data.table 1.13.6 2020-12-30  CRAN (R 4.0.3) DBI * 1.1.0 2019-12-15  CRAN (R 4.0.2) dbplyr 2.0.0 2020-11-03  CRAN (R 4.0.3) digest 0.6.27 2020-10-24  CRAN (R 4.0.3) dplyr * 1.0.2 2020-08-18  CRAN (R 4.0.2) ellipsis 0.3.1 2020-05-15  CRAN (R 4.0.2) evaluate 0.14 2019-05-28  CRAN (R 4.0.2) fansi 0.4.1 2020-01-08  CRAN (R 4.0.2) forcats * 0.5.0 2020-03-01  CRAN (R 4.0.2) foreign 0.8-81 2020-12-22  CRAN (R 4.0.3) Formula * 1.2-4 2020-10-16  CRAN (R 4.0.3) fs 1.5.0 2020-07-31  CRAN (R 4.0.2) generics 0.1.0 2020-10-31  CRAN (R 4.0.3) ggplot2 * 3.3.3 2020-12-30  CRAN (R 4.0.3) glue 1.4.2 2020-08-27  CRAN (R 4.0.2) gridExtra 2.3 2017-09-09  CRAN (R 4.0.2) gtable 0.3.0 2019-03-25  CRAN (R 4.0.2) haven 2.3.1 2020-06-01  CRAN (R 4.0.2) Hmisc 4.4-2 2020-11-29  CRAN (R 4.0.3) hms 0.5.3 2020-01-08  CRAN (R 4.0.2) htmlTable 2.1.0 2020-09-16  CRAN (R 4.0.2) htmltools 0.5.0 2020-06-16  CRAN (R 4.0.2) htmlwidgets 1.5.3 2020-12-10  CRAN (R 4.0.3) httr 1.4.2 2020-07-20  CRAN (R 4.0.2) jpeg 0.1-8.1 2019-10-24  CRAN (R 4.0.2) jsonlite 1.7.2 2020-12-09  CRAN (R 4.0.3) knitr 1.30 2020-09-22  CRAN (R 4.0.2) lattice * 0.20-41 2020-04-02  CRAN (R 4.0.2) latticeExtra 0.6-29 2019-12-19  CRAN (R 4.0.2) lifecycle 0.2.0 2020-03-06  CRAN (R 4.0.2) lubridate 22.214.171.124 2020-11-13  CRAN (R 4.0.3) magrittr 2.0.1 2020-11-17  CRAN (R 4.0.3) Matrix 1.3-2 2021-01-06  CRAN (R 4.0.3) mdsr * 0.2.4 2021-01-08  local modelr 0.1.8 2020-05-19  CRAN (R 4.0.2) mosaicData * 0.20.1 2020-09-13  CRAN (R 4.0.2) munsell 0.5.0 2018-06-12  CRAN (R 4.0.2) nnet 7.3-14 2020-04-26  CRAN (R 4.0.2) pillar 1.4.7 2020-11-20  CRAN (R 4.0.3) pkgconfig 2.0.3 2019-09-22  CRAN (R 4.0.2) png 0.1-7 2013-12-03  CRAN (R 4.0.2) purrr * 0.3.4 2020-04-17  CRAN (R 4.0.2) R6 2.5.0 2020-10-28  CRAN (R 4.0.3) RColorBrewer 1.1-2 2014-12-07  CRAN (R 4.0.2) Rcpp 1.0.5 2020-07-06  CRAN (R 4.0.2) readr * 1.4.0 2020-10-05  CRAN (R 4.0.2) readxl 1.3.1 2019-03-13  CRAN (R 4.0.2) repr 1.1.0 2020-01-28  CRAN (R 4.0.2) reprex 0.3.0 2019-05-16  CRAN (R 4.0.2) rlang 0.4.10 2020-12-30  CRAN (R 4.0.3) rmarkdown 2.6 2020-12-14  CRAN (R 4.0.3) RMySQL 0.10.21 2020-12-15  CRAN (R 4.0.3) rpart 4.1-15 2019-04-12  CRAN (R 4.0.2) rstudioapi 0.13 2020-11-12  CRAN (R 4.0.3) rvest 0.3.6 2020-07-25  CRAN (R 4.0.2) scales 1.1.1 2020-05-11  CRAN (R 4.0.2) sessioninfo 1.1.1 2018-11-05  CRAN (R 4.0.2) showtext 0.9-1 2020-11-14  CRAN (R 4.0.3) showtextdb 3.0 2020-06-04  CRAN (R 4.0.2) skimr 2.1.2 2020-07-06  CRAN (R 4.0.2) stringi 1.5.3 2020-09-09  CRAN (R 4.0.2) stringr * 1.4.0 2019-02-10  CRAN (R 4.0.2) survival * 3.2-7 2020-09-28  CRAN (R 4.0.2) sysfonts 0.8.2 2020-11-16  CRAN (R 4.0.3) tibble * 3.0.4 2020-10-12  CRAN (R 4.0.2) tidyr * 1.1.2 2020-08-27  CRAN (R 4.0.2) tidyselect 1.1.0 2020-05-11  CRAN (R 4.0.2) tidyverse * 1.3.0 2019-11-21  CRAN (R 4.0.2) utf8 1.1.4 2018-05-24  CRAN (R 4.0.2) vctrs 0.3.6 2020-12-17  CRAN (R 4.0.3) withr 2.3.0 2020-09-22  CRAN (R 4.0.2) xaringanthemer * 0.3.0 2020-05-04  CRAN (R 4.0.2) xfun 0.20 2021-01-06  CRAN (R 4.0.3) xml2 1.3.2 2020-04-23  CRAN (R 4.0.2) yaml 2.2.1 2020-02-01  CRAN (R 4.0.2)  /home/bbaumer/R/x86_64-pc-linux-gnu-library/4.0  /usr/local/lib/R/site-library  /usr/lib/R/site-library  /usr/lib/R/library
update.packages() function should be run periodically to ensure that packages are up-to-date
As of December 2020, there were more than 16,800 packages available from CRAN. This represents a tremendous investment of time and code by many developers (Fox 2009). While each of these has met a minimal standard for inclusion, it is important to keep in mind that packages in R are created by individuals or small groups, and not endorsed by the R core group. As a result, they do not necessarily undergo the same level of testing and quality assurance that the core R system does.
B.4.3 CRAN task views
The “Task Views” on CRAN are a very useful resource for finding packages. These are curated listings of relevant packages within a particular application area (such as multivariate statistics, psychometrics, or survival analysis). Table B.1 displays the task views available as of January 2021.
|ChemPhys||Chemometrics and Computational Physics|
|ClinicalTrials||Clinical Trial Design, Monitoring, and Analysis|
|Cluster||Cluster Analysis and Finite Mixture Models|
|Databases||Databases with R|
|Environmetrics||Analysis of Ecological and Environmental Data|
|ExperimentalDesign||Design of Experiments (DoE) and Analysis of Experimental Data|
|ExtremeValue||Extreme Value Analysis|
|FunctionalData||Functional Data Analysis|
|gR||gRaphical Models in R|
|Graphics||Graphic Displays and Dynamic Graphics and Graphic Devices and Visualization|
|HighPerformanceComputing||High-Performance and Parallel Computing with R|
|Hydrology||Hydrological Data and Modeling|
|MachineLearning||Machine Learning and Statistical Learning|
|MedicalImaging||Medical Image Analysis|
|ModelDeployment||Model Deployment with R|
|NaturalLanguageProcessing||Natural Language Processing|
|OfficialStatistics||Official Statistics and Survey Methodology|
|Optimization||Optimization and Mathematical Programming|
|Pharmacokinetics||Analysis of Pharmacokinetic Data|
|Phylogenetics||Phylogenetics, Especially Comparative Methods|
|Psychometrics||Psychometric Models and Methods|
|Robust||Robust Statistical Methods|
|SocialSciences||Statistics for the Social Sciences|
|Spatial||Analysis of Spatial Data|
|SpatioTemporal||Handling and Analyzing Spatio-Temporal Data|
|TimeSeries||Time Series Analysis|
|Tracking||Processing and Analysis of Tracking Data|
|WebTechnologies||Web Technologies and Services|
B.5 Further resources
Advanced R is an excellent source for learning more about how R works (H. Wickham 2019). Extensive resources and documentation about R can be found at the Comprehensive R Archive Network (CRAN).
The forcats package, included in the tidyverse, is designed to facilitate data wrangling with factors.
More information regarding tibbles can be found at https://tibble.tidyverse.org.
JupyterLab and JupyterHub are alternative environments that support analysis via sophisticated notebooks for multiple languages including Julia, Python, and R.
Problem 1 (Easy): The following code chunk throws an error.
%>% mtcars select(mpg, cyl)
Error in select(., mpg, cyl): could not find function "select"
What is the problem?
Problem 2 (Easy): Which of these kinds of names should be wrapped with quotation marks when used in R?
- function name
- file name
- the name of an argument in a named argument
- object name
Problem 3 (Easy): A user has typed the following commands into the RStudio console.
<- 2:10 obj1 <- c(2, 5) obj2 <- c(TRUE, FALSE) obj3 <- 42obj4
What values are returned by the following commands?
* 10 obj1 2:4] obj1[-3] obj1[+ obj2 obj1 * obj3 obj1 + obj4 obj1 + obj3 obj2 sum(obj2) sum(obj3)
Problem 4 (Easy): A user has typed the following commands into the RStudio console:
<- list(x1 = "sally", x2 = 42, x3 = FALSE, x4 = 1:5)mylist
What values do each of the following commands return?
is.list(mylist) names(mylist) length(mylist) 2]] mylist[["x1"]] mylist[[$x2 mylistlength(mylist[["x4"]]) class(mylist) typeof(mylist) class(mylist[]) typeof(mylist[])
Problem 5 (Easy): What’s wrong with this statement?
help(NHANES, package <- "NHANES")
Problem 6 (Easy): Consult the documentation for
CPS85 in the
mosaicData package to determine the meaning of CPS.
Problem 7 (Easy): The following code chunk throws an error. Why?
library(tidyverse) %>% mtcars filter(cylinders == 4)
Error in filter(., cylinders == 4): object 'cylinders' not found
What is the problem?
Problem 8 (Easy): The
date function returns an indication of the current time and date. What arguments does
date take? What kind of object is the result from
date? What kind of object is the result from
Problem 9 (Easy): A user has typed the following commands into the RStudio console.
<- c(10, 15) a <- c(TRUE, FALSE) b <- c("happy", "sad")c
What do each of the following commands return? Describe the class of the object as well as its value.
data.frame(a, b, c) cbind(a, b) rbind(a, b) cbind(a, b, c) list(a, b, c)[]
Problem 10 (Easy): For each of the following assignment statements, describe the error (or note why it does not generate an error).
<- sqrt 10 result1 <-- "Hello to you!" result2 <- "Hello to you" 3result <- "Hello to you result4 result5 <- date()
Problem 11 (Easy): The following code chunk throws an error.
library(tidyverse) %>% mtcars filter(cyl = 4)
Error in filter(., cyl = 4): unused argument (cyl = 4)
The error suggests that you need to use
== inside of
Problem 12 (Medium): The following code undertakes some data analysis using the HELP (Health Evaluation and Linkage to Primary Care) trial.
library(mosaic) <- ds read.csv("http://nhorton.people.amherst.edu/r2/datasets/helpmiss.csv") summarise(group_by( select(filter(mutate(ds, sex = ifelse(female == 1, "F", "M") !is.na(pcs)), age, pcs, sex), ), sexmeanage = mean(age), meanpcs = mean(pcs), n = n())),
Describe in words what computations are being done. Using the pipe notation, translate this code into a more readable version.
Problem 13 (Medium): The following concepts should have some meaning to you: package, function, command, argument, assignment, object, object name, data frame, named argument, quoted character string.
Construct an example of R commands that make use of at least four of these. Label which part of your example R command corresponds to each.
B.7 Supplementary exercises
No exercises found