These exercises are taken from the text as data chapter from Modern Data Science with R): http://mdsr-book.github.io/. Other materials relevant for instructors (sample activities, overview video) for this chapter can be found there.
Speaking lines in Shakespeare’s plays are identified by a line that starts with two spaces, then a string of capital letters and spaces (the character’s name) followed by a period. Use grep()
to find all of the speaking lines in Macbeth. How many are there?
SOLUTION:
library(mdsr)
library(tidyr)
library(tm)
library(wordcloud)
data(Macbeth_raw)
macbeth <- strsplit(Macbeth_raw, "\r\n")[[1]]
head(macbeth)
## [1] "This Etext file is presented by Project Gutenberg, in"
## [2] "cooperation with World Library, Inc., from their Library of the"
## [3] "Future and Shakespeare CDROMS. Project Gutenberg often releases"
## [4] "Etexts that are NOT placed in the Public Domain!!"
## [5] ""
## [6] "*This Etext has certain copyright implications you should read!*"
# solution goes here
Find all the hyphenated words in one of Shakespeare’s plays.
SOLUTION:
# solution goes here
Use the babynames
data table from the babynames
package to find the ten most popular:
SOLUTION:
# solution goes here
joe
, jo
Joe
or Jo
(e.g., Billyjoe
).SOLUTION:
# solution goes here
Find all of the adjectives in one of Shakespeare’s plays that end in more
or less
(note change from original question 15.4).
SOLUTION:
# solution goes here
Find all of the lines containing the stage direction or in one of Shakespeare’s plays (note change from original exercise 15.5).
SOLUTION:
# solution goes here
Use regular expressions to determine the number of speaking lines from the Complete Works of William Shakespeare (http://www.gutenberg.org/cache/epub/100/pg100.txt). Here, we care only about how many times a character speaks—not what they say or for how long they speak.
SOLUTION:
# solution goes here
Make a bar chart displaying the top 100 characters with the greatest number of lines. Hint: you may want to use either the stringr::str_extract()
or strsplit()
function here.
SOLUTION:
# solution goes here
In this problem, you will do much of the work to recreate Mark Hansen’s Shakespeare Machine. Start by watching a video clip (http://vimeo.com/54858820) of the exhibit. Use The Complete Works of William Shakespeare (see earlier exercise) and regular expressions to find all of the hyphenated words in Shakespeare Machine. How many are there? Use %in\%
to verify that your list contains the following hyphenated words pictured at 00:46 of the clip.
SOLUTION:
sm_words <- c("true-fix'd", "pale-hearted", "lean-fac'd", "hard-hearted",
"best-regarded", "thick-ribbed", "both-sides", "sea-like.",
"shrill-shrieking", "lust-stain'd", "tragical-historical,")
# solution goes here
Find an interesting Wikipedia page with a table, scrape the data from it, and generate a figure that tells an interesting story. Include an interpretation of the figure.
SOLUTION:
# solution goes here
The site displays questions and answers on technical topics.
The following code downloads the most recent questions related to the package.
library(httr)
# Find the most recent R questions on stackoverflow
getresult <- GET("http://api.stackexchange.com",
path = "questions",
query = list(site = "stackoverflow.com", tagged = "dplyr"))
stop_for_status(getresult) # Ensure returned without error
questions <- content(getresult) # Grab content
names(questions$items[[1]]) # What does the returned data look like?
## [1] "tags" "owner" "is_answered"
## [4] "view_count" "answer_count" "score"
## [7] "last_activity_date" "creation_date" "last_edit_date"
## [10] "question_id" "link" "title"
length(questions$items)
## [1] 30
substr(questions$items[[1]]$title, 1, 68)
## [1] "Left join on date range by group ID"
substr(questions$items[[2]]$title, 1, 68)
## [1] "R: regrouping multiple rows into a single row (by value in first col"
substr(questions$items[[3]]$title, 1, 68)
## [1] "Relative frequencies / proportions with dplyr"
How many questions were returned? Without using jargon, describe in words what is being displayed and how it might be used.
SOLUTION:
# solution goes here
Repeat the process of downloading the content from related to the package and summarize the results.
SOLUTION:
# solution goes here