Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

MET AD599 Introduction to Python and SQL for Business Analytics

Assignment #3 – For the questions in this assignment, you will use Python Jupyter Notebook to analyze the following questions:

Part 1:

1. Download the Titanic Dataset from here: https://www.kaggle.com/brendan45774/test-file  

2. Import the dataset into Jupyter Notebook, and if there are NANs in column “Age”, replace them with Median of Age (0.35 pts)

3. Use a Markdown Cell to answer this question: Do you think Gender/Sex matters on whether a passenger will survive in Titanic Incident? And why is that? Do you have evidence, or have you heard some stories? (0.35 pts)

4. Use a Markdown Cell to answer this question: Which model will you use on this dataset? And why did you choose them? Please kindly note that the output should be “Survive” column. (0.35 pts)

5. Convert the “Sex” column from “object” to “numeric” (0.35 pts)

6. Use a Markdown Cell to answer this question: Which variables will you choose as inputs to build the model? Why? (0.35 pts)

7. Check all the variables you choose from step 6, if there are any NANs, replace them with the Average of the column.  (0.35 pts)

8. Now build the model. Please remember to split a train and valid set first, then use the train set to build the model (0.35 pts)

9. Use a Markdown Cell to answer this question:  Does this model have an “accuracy rate”? Explain why. (0.35 pts)

10. Use the .predict function to predict the results for the valid set. What is the accuracy rate of the model on valid set? Is it good? (0.35 pts)

11. Make up an imaginary individual, and use markdown cells to give a brief introduction of this individual (such as Sex, Age, Fare, etc.). Will this individual survive? (0.35 Pts)

Part 2:

1. Download the house Dataset from here: https://www.kaggle.com/thomasnibb/amsterdam-house-price-prediction 

2. This dataset is plain and simple, the output should be “Price”, and input should be “Area”, “Room”, “Lon”, and “Lat”. Check if there are any NANs in these variables, if there is, then replace the NANs with Mean. (0.35 Pts)

3. Split the dataset into Train and Valid sets. Calculate the CV score of the Train sets for Linear Regression, Polynomial Regression (degree 1 to 4), Lasso Regression, Ridge Regression, KN Regression, Decision Tree Regression, and Random Forest Regression. (2.1 Pts)

4. Use a markdown cell to answer this question: Which model will you choose and why? (0.35 Pts)

5. Make up an imaginary house, and use the .predict function to predict the price of it with the model you choose.(0.7 Pt)