Stat 33B, Fall 2021 HW3: Basics of "dplyr" and "ggplot2"
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
HW3: Basics of "dplyr" and "ggplot2"
Stat 33B, Fall 2021
Introduction
In this assignment, you will work with another approach to manipulate tables and create statistical graphics. You are going to use the functionality of the tidyverse packages:
• "dplyr" to work with tabular data in a more syntactic way.
• "ggplot2" to create graphics in a more consistent and visually pleasing way
library(tidyverse)
General Instructions
• Write your narrative and code in an Rmd (R markdown) file.
• Name this file as hw3-first-last.Rmd, where first and last are your first and last names (e.g. hw3-gaston-sanchez.Rmd).
• Please do not use code chunk options such as: echo = FALSE, eval = FALSE, results = 'hide '. All chunks must be visible and evaluated.
NBA Players Data
The data file for this HW is nba2018-players.csv, which comes in the same directory containing this Rmd (see bCourses).
Download a copy of the CSV file and place it in the same working directory where you have this Rmd.
To import the data in R you can use the base function read.csv(), or you can also use read_csv() from the package "readr".
1) Filtering, slicing, and selecting
a) Use slice() to subset the data by selecting the first 5 rows. Do the same thing with slice_head()
# your code
b) Use slice() to subset the data by selecting rows 10, 15, 20, . . . , 50. The function seq() is your friend.
# your code
c) Use filter() to subset those players with height less than 70 inches tall.
# your code
d) Use filter() to subset rows of Golden State Warriors ( 'GSW') that play in center ( 'C') position.
# your code
e) Use filter() and then select(), to subset rows of lakers ( 'LAL'), and then display their names.
# your code
f) Find out how to select the name, age, and team, of players with more than 10 years of experience, making 15 million dollars or more.
# your code
g) Find out how to select the name, team, height, and weight, of rookie players, 20 years old, displaying only the first five occurrences (i.e. rows)
# your code
2) Adding new variables, and reordering rows
a) Use the original data frame to filter() and arrange() those players with height less than 71 inches tall, in increasing order (based on height values).
# your code
b) Use the original data frame dat to display the name, team, and salary, of the top-5 highest paid players
# your code
c) Create a data frame gsw_mpg of GSW players, that contains variables for player name, experience, and min_per_game (minutes per game), sorted by min_per_game (in de- scending order). And display gsw_mpg
# your code
3) Summaries and Grouped Operations
a) use summarise() to get the largest height value.
# your code
b) use summarise() to get the standard deviation of points3.
# your code
c) use summarise() and group_by() to display the median of three-points, by team.
# your code
d) display the average triple points by team, in ascending order, of the bottom-5 teams (worst 3pointer teams)
# your code
e) obtain the mean and standard deviation of age, for Power Forwards, with 5 and 10 years of experience (including 5 and 10).
# your code
4) Graphics with "ggplot2"
a) Create a data frame gsw by subsetting the data for Golden State Warrior players, and find out to make a scatterplot of height and weight, using geom_text() to display the names of the players.
# your code
b) Get a density plot of salary (for all NBA players).
# your code
c) Get a histogram of points2 with binwidth of 50 (for all NBA players).
# your code
d) Get a barchart of the position frequencies (for all NBA players).
# your code
e) Make a scatterplot of experience and salary of all Centers, and use geom_smooth() to add a regression line.
# your code
5) Faceting
One of the most attractive features of "ggplot2" is the ability to display multiple facets. The idea of facets is to divide a plot into subplots based on the values of one or more categorical (or discrete) variables.
Here’s an example. What if you want to get scatterplots of points and salary separated (or grouped) by position? This is where faceting comes handy, and you can use facet_wrap() for this purpose:
# scatterplot by position
ggplot(data = dat, aes (x = points, y = salary)) +
geom_point() +
facet_wrap(~ position)
|
The other faceting function is facet_grid(), which allows you to control the layout of the facets (by rows, by columns, etc)
# scatterplot by position
ggplot(data = dat, aes (x = points, y = salary)) +
geom_point(aes (color = position), alpha = 0.7) +
facet_grid(~ position) +
geom_smooth(method = loess)
## `geom_smooth()` using formula 'y ~ x '
# scatterplot by position
ggplot(data = dat, aes (x = points, y = salary)) +
geom_point(aes (color = position), alpha = 0.7) +
facet_grid(position ~ .) +
geom_smooth(method = loess)
## `geom_smooth()` using formula 'y ~ x '
5.1) More plots
a) Make scatterplots of experience and salary faceting by position
# your code
b) Make scatterplots of experience and salary faceting by team
# your code
c) Make density plots of age faceting by team
# your code
d) Make scatterplots of height and weight faceting by position
# your code
e) Make a scatterplot of experience and salary for the Warriors, but this time add a theme layer—e.g. theme_bw(), theme_minimal(), theme_dark(), theme_classic()—to get a simpler background
# your code
2021-12-15