Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

158.739-2024 Semester 1

Project 1

Deadline:

Hand in by midnight April 5 2024

Evaluation:

20% of your final course grade.

Work

This assignment is expected to be completed individually. See below.

Purpose:

Gain experience in perform data wrangling, data visualization and introductory data    analysis using Python with suitable libraries. Begin developing skills in formulating a problem from data in a given domain, asking questions of the data, extracting insights from a real-world dataset. Learning outcomes 1, 2 and 4 from the course outline.

Project outline:

This project requires that you perform data cleaning, exploratory data analysis (EDA) as well as uncover insights from a real-world dataset. You are required to present your work in a Jupyter Notebook. The notebook is expected to have the general structure of a report, together with all the Python scripts embedded in it and, descriptions of the steps you took in your analysis and the data cleaning processes.

After you have cleaned the data and prepared it for analysis, your task is to gain an understanding of the problem domain, which will enable you to formulate some assumptions as well as key questions that will drive your research. The research objectives are open-ended. It is your task to find correlations, interesting trends and innovative ideas on how to best use the data in the dataset.

You will need to transform data into different formats where necessary. Be creative and generate new columns as derivatives from others where useful. Make justifiable decisions on how to handle missing values depending on your research goals. Look for erroneous values and restore the integrity of the data where needed. Be critical.

Utilise a variety of exploratory data analysis techniques to make sense of the data, which will then guide you to dig deeper and drive new avenues of investigation. Use visualisations to communicate your insights and messages to the reader. Be effective with how you construct your graphs and preserve accuracy and integrity.

Finally, you may install and use any additional Python packages you wish that will help you with this project.

Dataset Domain:

The dataset covers socio-economic data on New Zealand, stretching back to early 1980s. The data covers a range of topics: income and wealth distribution, poverty and deprivation levels, health measures, education outcomes, safety and security, housing as well as employment. The data is captured by various government agencies as well as some private   sector entities.

There are approximately ~100 columns in the dataset. The columns range widely in their completeness and coverage. A document is provided which explains briefly what each column means and where it originated.

The dataset has been intentionally tampered with in order to provide you with a sufficient amount of practice in data wrangling and cleaning. Cleaning the dataset represents a significant amount of marks in the assignment.

Once the dataset is ready for analysis, consider how to create a data product from your insights that helps inform public discourse on these socio-economic matters.

Dataset Usage Conditions:

The dataset was collated by a group of researchers belonging to the Knowledge Exchange Hub at Massey University. The dataset values are obtained from a mixture of publicly available sources as well as confidential private sources. It also contains a number of derived values. The dataset has not been updated and as such serves as a good opportunity for students to hunt out the data sources where possible and to update the raw values and the analysis since it was originally conducted. The website describing this project as well as a publication regarding the dataset and its analysis can be found here:https://sharedprosperity.co.nzYour analysis is expected to consider the data from a unique perspective to that found on the website.

Bonus Marks:

Additional marks are offered to students who are prepared to go beyond the specified requirements. Bonus marks will be   granted in respect to the meaningful integration of additional data into the main dataset. The additional data files comprise the NZ General Social Survey Data from the 2008, 2010, 2012, 2014, 2016 years. These data files are provided. You are welcome to integrate latest releases on these data too for additional marks.

Some of the variables can also be updated with more recent values. You will be awarded additional marks if you take the effort to acquire these datapoints.

Marking criteria:

Marks will be awarded for different components of the project using the following rubric:

Component

Marks

Requirements and expectations

Data Wrangling

30

Thoroughness of the data cleaning using Python.

EDA/Visualisation

30

Quality of investigation into potential erroneous values, decision making process on how to handle missing data and potential interpolation options.

Stating assumptions and justifying them.

Variety of exploratory research and inquiry into different aspects of the dataset, use of broad and appropriate range of visualisations and their effective communication.

Data Analysis

30

Depth, sophistication and difficulty of analysis being performed.

Diversity of techniques used to answer the research questions and communicate the findings to the reader.

Report Presentation

10

Structure of the report and use of headers and formatting.

Clear sections and logical flow.

Well-articulated research questions and goals.

Suitable introduction and conclusion.

Tidy code sections and their explanations where needed.

Not cluttering the notebooks with too many dataframe data dumps.

BONUS MARKS

 

 

Integration of Additional Datasets

5

Meaningful integration and augmentation of insights with the NZ General Social Survey data.

Updating of variables

5

Updating of variables with more recent values where possible.

Jupyter Notebook Template

A notebook template has been created for you that you are invited to use. Make sure that the introduction section has all the necessary parts filled out that are relevant to your project. The template file is called ‘Jupyter Project Report Template.ipynb ’

Group Work:

This assignment is expected to be completed individually. However, students strongly desiring to complete this assignment in pairs maybe given permission on the condition that their final mark will be a maximum of 80%. The completion of the bonus component would make their maximum score of 90%.

Hand-in:

Submit ONLY ONE Jupyter notebook file via the Stream assignment submission link. However, please extract an html page from your notebook and submit this too in case there are errors in your notebook and we cannot open it. Please do not email your submission to the teaching staff.

****************

*** Plagiarism ***

****************

It is mandatory that any assessment items that you submit during your University study are your own work.   Massey University takes a firm stance on academic misconduct, such as plagiarism and any form of cheating.

Plagiarism is the copying or paraphrasing of another person’s work, whether published or unpublished, without clearly acknowledging it.  It includes copying the work of other students and reusing work previously submitted by yourself for another course. It also includes the copying of code from unacknowledged sources.

Academic integrity breaches impact on students as it disadvantages honest students and undermines the credibility of your qualification. Plagiarism, and cheating in tests and exams will be penalised; it is likely to lead to loss of marks for that item of assessment and may lead to an automatic failing grade for the course and/or exclusion from reenrolment at the University.

Please see the Academic Integrity Guide for Students on the University website for more information. The Guide steps you through the University Academic Integrity Policy and Procedures.  For example you will find definitions of academic integrity misconduct, such as plagiarism; how misconduct is determined and managed; and where to find resources and assistance to help develop the skills of academic writing, exam preparation and time management. These skills will help you approach university study with academic integrity.