Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Predictive Analytics Group Project (Group 4)

For this project, you will be working with the cars.csv datafile. The dataset is a sampled data containing information on used cars at different car dealerships. We are going to use this dataset to create a predictive model to predict whether or not the sales price of a used car will be greater than $10,000. The dataset has the following variables.

id

The ID of the car in the system

odometer

The odometer indicating number of kilometers driven

year

The manufacture year of the car

engine_type

Type of car engine

engine_capacity

The engine capacity (cylinder volume) (measured in litres)

body_type

The style of the car body

high_ 10

Whether or not the sales price of the car was above $10,000. 0 for no and 1 for yes

For your project submission you will need to submit a Jupyter Notebook file that contains your explanations and text in Markdown, your code and your code results (including visualizations, tables, etc.).

The purpose of your project is to develop a model to predict the target variable "high_10" using classification. In addition to the predictive model, you would need to use exploratory analytics, like data manipulation/aggregation,visualization, etc. to generate insight into the target and other predictor variables, prior to creating the predictive model. This project is for you to showcase what you have learned in the course. You are expected to use learning from different topics and areas in the course, e.g., exploration, visualization, preparation, etc. Please note the following points:

•   Your submission will be one Jupyter Notebook file. All of your code, code results, experimentations and text should be included in this file along with your procedures, etc. Please use proper headers to create  sections  and  subsections  in your markdown  file. Maintain a cohesive narrative flow, treating the Jupyter notebook file like a report with code and results and explanation. Make sure your code and code results are accompanied by proper explanation of your procedures in Markdown. Think of the Markdown file as a report that you are preparing to showcase your learnings in the course. When in doubt, add content/explanation!

•   Use techniques you learned in the exploratory analytics part of the course to explore as well as visualize your data (predictor and target variables) to get a good profile on your data and potentially create insight into variable behaviour and relationships before creating a predictive model. Report any insight you generated using explanations in the markdown file. Remember,predictive models often rely on or supplement the insight that has already been generated in the exploratory phases.

•   Use any data preparation method that you find relevant. Experiment with different ideas. Leave all procedures in the Notebook file. Some data preparation maybe required.

•   To develop a model to predict your target variable, you do not necessarily need to use all the other variables as predictor variables. However, use any predictor variables that helps predicting the target variable better. Please include any reasoning/assumption/experimentation to include or exclude predictor variables in your Notebook. If you have not worked with a certain variable type (like dates) you do not need to include them in your model.

•   Your project should include a cross-validation of your model. To evaluate your model, it is important that you use cross-validation (using training and testing sets) and generate proper results. Make sure to review, comment and interpret different measures created by your results to evaluate the performance of your model(s). Include explanations.

•   You need to use a minimum  of two classification  algorithms  to  create  and  evaluate predictive models and do not forget to compare evaluation results across different models. If any of your predictive models can be improved with a change in a setting, try it out to improve the model. Leave any experimentation, results, and their explanations in the file, even if it did not result in any improvement.

Note_1: You will not be evaluated on the performance/accuracy of your model. In fact, some datasets may not even result in models with high accuracy (or great performance), etc.   Therefore,    do   not    be   discouraged    if   your    model   does    not   show    high accuracy/performance.  Your project  mark  will  not  be  determined  based  on  the  final performance measures of your model.

Note_2: The process of training your model may not be instantaneous as some datasets are larger than datasets we have worked with in class. It may take even up to 10-20 seconds, depending on the CPU of your computer to get the results if the dataset is large. Do not be discouraged if your model does not get trained instantaneously. Give it some time and it will be done.

Submission: Please make sure your project file (Jupyter Notebook file) is submitted into the "Course Project" folder on Avenue to Learn by 11:59PM, on the last day of classes (Wednesday, December 06). Submitting work after the deadline may result in a deduction of points or a penalty. You can find the submission folder under Assessment>Assignments. Please submit one Jupyter Notebook file per group.

•   Good luck. I am looking forward to reviewing your projects!