Stat 33B, Fall 2021 HW1: Basic Data Objects
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
HW1: Basic Data Objects
Stat 33B, Fall 2021
Introduction
The purpose of this assignment is to work with various atomic data structures in R (e.g. vec- tors of different data types, factors, and arrays).
Use this assignment to start developing your manipulation skills of basic data objects in R: use of bracket notation, understanding vectorization, coercion rules, recycling, etc.
General Instructions
• Write your narrative and code in an Rmd (R markdown) file.
• Name this file as hw1-first-last.Rmd, where first and last are your first and last names (e.g. hw1-gaston-sanchez.Rmd).
• Please do not use code chunk options such as: echo = FALSE, eval = FALSE, results = 'hide '. All chunks must be visible and evaluated.
1) Technical questions about vectors
Consider the following two vectors x and y:
x <- c (1 , 2 , 3 , 4 , 5)
y <- c ("a" , "b" , "c" , "d" , "e")
Provide the output and explain what’s happening in each of the following commands—in terms of subsetting, coercion, recycling, vectorization, etc. You don’t need to run the code, just explain.
a) y[x <= 2]
b) y[x[5]]
c) y[!(x < 3)]
d) y[x/x]
e) y[x[-2][3]]
f) x[(y != 'c ') & (y != 'a ')]
GSW Data (season 2018)
Data
The (raw) data for this assignment has to do with some players from Golden State Warriors (2018), displayed in the figure below (source: Basketball-Reference).
2) Vectors
Create vectors and factors for the columns in the data table displayed above, according to the following data types. If there are missing values, codify them as NA.
• number: integer vector
• player: character vector
• position: factor
• height: character vector
• weight: double (i.e. real) vector
• birthdate: character vector
• experience: integer vector
• college: character vector
3) Vector Subsetting (i.e. indexing)
Write R commands (just one single-line commands)—displaying the output—that answer the following questions:
a) What is the height of the tallest player?
b) What is the college of the player that has a height of 6-6?
c) What is the position of the player with more years of experience? Hint: the which.max() function is your friend.
d) What is the number of the lightest player? Hint: the which.min() function is your friend.
e) What is the average height for those players with more than 5 years of experience?
f) How many players have a weight larger than the average (i.e. mean) weight?
g) How many players have between 9 and 12 years of experience (inclusive)?
h) What is the mean years of experience of Shooting Guard (SG) players?
i) What is the median weight of those players with a position different of Center (C)?
j) What is the first quartile (i.e. bottom 25th percentile) of years of experience among
Power Forwards (PF) and Shooting Guards (SG). Hint: the quantile() function is your friend.
4) List for GSW
Use the vectors created in the previous section to create the following list gsw:
gsw <- list(
player = player,
number = number,
position = position,
weight = weight,
experience = experience
)
Use the list gsw to write R commands—displaying the output—that answer the following questions (use only the list gsw, NOT the individual vectors):
a) What is the number of the heaviest player?
b) What is the position of the player with the least amount of experience?
c) How many players have less than 8 or more than 11 years of experience?
d) What is the third quartile (i.e. bottom 75th percentile) of years of experience among Power Forwards (PF) and Shooting Guards (SG). Tip: the function quantile() is your friend.
e) What is the name of the player whose weight is furthest from the average weight (of all players)? Tip: the function which.max() is your friend.
5) 2D Arrays (i.e. Matrices)
Consider the following vector lord:
lord <- c ( 'v ' , 'o ' , 'l ' , 'd ' , 'e ' , 'm ' , 'o ' , 'r ' , 't ')
Use the vector lord to create a matrix vol with 3 rows and 3 columns, like the one displayed below.
[1,] [2,] [3,]
[,1] [,2] [,3]
"v" "d" "o"
"o" "e" "r"
"l" "m" "t"
Use bracket notation and the matrix vol to write R commands in its individual code chunks— displaying the output—to perform the following tasks. Hint: the combine function c() and the colon : operator—to generate numeric sequences—are your friends.
a) obtain the following output (the code has to be one single command)
[1] "v" "d" "o"
b) obtain the following output (the code has to be one single command)
[1,] [2,]
[,1] [,2]
"d" "v"
"e" "o"
c) obtain the following output (the code has to be one single command)
[1,] [2,] [3,]
[,1] [,2] [,3]
"l" "m" "t"
"o" "e" "r"
"v" "d" "o"
d) obtain the following output (the code has to be one single command)
[1,] [2,] [3,]
[,1] [,2] [,3]
"v" "d" "d"
"o" "e" "e"
"l" "m" "m"
e) obtain the following output (the code has to be one single command)
[1,] [2,] [3,]
[,1] [,2] [,3]
"t" "m" "l"
"r" "e" "o"
"o" "d" "v"
f) obtain the following output (the code has to be one single command)
[1,] [2,] [3,]
[,1] [,2] [,3] [,4]
"t" "m" "m" "t"
"r" "e" "e" "r"
"o" "d" "d" "o"
4
[4,] "o" "d" "d" "o"
[5,] "r" "e" "e" "r"
[6,] "t" "m" "m" "t"
g) obtain the following output (the code has to be one single command)
[1,] [2,] [3,] [4,] [5,] [6,]
[,1]
"l"
"o"
"v"
"v"
"o"
"l"
[,2]
"m"
"e"
"d"
"d"
"e"
"m"
[,3]
"t"
"r"
"o"
"o"
"r"
"t"
[,4]
"t"
"r"
"o"
"o"
"r"
"t"
[,5]
"m"
"e"
"d"
"d"
"e"
"m"
[,6]
"l"
"o"
"v"
"v"
"o"
"l"
6) Factors
R has a built-in data set called state.x77 which contains information related to the 50 states of the United States of America from 1977.
# a few rows of the first 5 variables in state.x77
head(state.x77[ ,1 :5])
## ## Alabama ## Alaska ## Arizona ## Arkansas
Population Income Illiteracy Life Exp Murder
2.1 69.05
1.5 69.31
1.8 70.55
1.9 70.66
## California ## Colorado
21198
2541
5114
4884
1.1 0.7
71.71 72.06
10.3
6.8
Let’s consider the second column "Income"
head(state.x77[ ,2])
## Alabama Alaska Arizona Arkansas California Colorado ## 3624 6315 4530 3378 5114 4884
Learn how to use the function cut() to create a factor income—based on the column "Income"—by using the breaking points and labels according to the following table:
intervals
(3000, 3500]
(3500, 4000]
(4000, 4500]
(4500, 5000]
(5000, 5500]
(5500, 6000]
(6000, 6500]
(6500, 7000]
Once you’ve created income, display its frequencies (i.e. counts); and also use barplot() to make a simple barchart of such frequencies.
2021-12-15