关键词 > STAT6/4365

STAT 6/4365 Final Project

发布时间:2022-12-12

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

STAT 6/4365 Final Project

Your Project has two parts. Upload documents containing answers to both parts by the deadline to the assignment folder on eLC.

Part I

Choose one of the scenarios below and create an R shiny app. Upload a Zip folder containing your app files to eLC submission folder -- name this Zip file as Final Project Part 1” .

a) Create a p-value calculator

Your app should be able to calculate P-value based on a test statistic. Users should be able to     specify whether it is a left-tailed, right-tailed, or two-tailed test. Users should also be able to       select either normal distribution, t-distribution or Chi-squared distribution. Your app should       also be able to plot a graph corresponding to the distribution and p-value. Default parameters   for the mean and SD of the normal distribution should be 0 and 1, however users should be        able to change them. Similarly, users should be able to specify the DF for t-distribution, and Chi- square distribution. I have given an example below. Your app does not need to look exactly like this, but it should have all the features mentioned above to receive the full credit.

b) Create a critical-value (quantile) calculator

Your app should be able to calculate critical values (quantiles) based on a confidence level.        Users should be able to select either normal distribution, t-distribution, or Chi-squared distribution. Your app should also be able to plot a graph corresponding to the distribution and confidence level. Users should be able to specify the DF for t-distribution and Chi-square distribution. I have given an example below. Your app does not need to look like this example, but it should have all the features mentioned above to receive the full credit.

c) Sampling distribution of sample proportion

Create an app to demonstrate the sampling distribution of sample proportion. Your app should accept following arguments.

•   Population Proportion

•   Sample Size

•   Number of samples: default value is 500

Your function should output the sampling distribution of sample proportion as a histogram, mean of sample proportions, and SD of sample proportions. I have given an example below. Your app does not need to look exactly like this example, but it should have all the features mentioned above to receive the full credit.

d) Multiple linear regression app

The app should take input of a data frame and conduct multiple linear regression using the first column as the response and other columns as the predictors. Your app should output, test statistic and p-value of the overall F-test, R-Squared value, slope and p-value correspond to each of the predictor variables, normal-Q-Q plot and residual vs. predicted value plot.

Part II

Part II should be done using Python. Upload a document containing your answers by the deadline to the eLC submission folder -- name this document as Final Project Part 2” .

The FIFA World Cup is the most prestigious football tournament in the world, as well as the most widely viewed and followed single sporting event  in the world. The dataset fifa_historic.csv contains  information  related  to  results  of  historical  matches  since  the  beginning  of  the championship (1930) for all participating teams. Download the fifa_historic.csv data set and answer the questions below.

a)   Calculate average attendance for each year and create a line graph using MatPlotLib. State your findings.

b)   Next, investigates the attendance for only final and semi-final matches. As before, plot average attendance for each year. Write a short paragraph explaining your results. (An example plot is given below.)

c)   Using appropriate graphs and/or summary statistics find which teams won most of the word-cups, semi-finals and third-place matches (Count Germany FR and Germany as the same country). Write a short paragraph explaining your results.

d)   Next, we are going to review individual player stats (https://www.fifaindex.com/). Prediction of market value is important as soccer clubs spend astronomical amount of money on footballer acquisition. Download All_Players.csv dataset and fit a linear regression model to predict market value using personal metrics and skills ratings of the player -- Create at least three models with different predictor variables and clearly explain the reasoning behind your choices. Use k-fold cross-validation method to decide the best model. Explain the results of your final (best) model in the context of the problem.

e)   Use your final model to predict the market value of the player, Kylian Mbappé .

Player

Overall

Score

Potential

Score

Height

Weight

Age

Preferred

Foot

Kylian Mbappe

89

95

178

73

21

Right

Ball Skills

Defence

Mental

Passing

Physical

Shooting

Goalkeeping

89.5

33.33333333

74.16667

76.66667

86.42857

77.5

8.4

f)    Using the results of part d), create a function that gives player’s market value based on the given stats. (Hint: use the final model from part d)

Part g) is required for students registered in the section STAT 6365 and optional for students registered in the section STAT 4365.

g)   Create a database called fifa_data.db. Add a table called Countries, with character variables, Country and numerical variables, Year, Won the world cup (1 - yes, 0 - no). Add at least 3 rows to your table. Add another table called Players, with character variables, Name, Country and numerical variables, Age, Goals scored, Number of assists. Add at least 6 rows to your table. Write a query to display all the players who scored goals for the USA.