STAT40800 Data Programming with Python
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
STAT40800 Data Programming with Python (online)
Data Analysis Project
For this project you will study the 2016 Irish Census data. The census is a detailed account of every person living in Ireland. A census is carried out every 5 years by the Central Statis- tics Oﬃce (CSO).
The 2016 Census was broken down into 15 themes, each studying a diﬀerent component of society. Within each theme there are a number of variables/statistics, e.g. within Theme .l sey． age and maruial tuauzt there is a variable for population aged 20-24, population aged 25-29, population aged 30-34, etc.
For this project you will study 2-4 of the census themes. You are free to choose which themes to study. You must analyse the data within your chosen themes and use the data to make inferences about the population of Ireland. For example, is the age proﬁle signiﬁcantly diﬀerent in Dublin compared to Kerry? Or can we predict the level of unemployment in an area based on based on the age demographic, level of education, gender ratio, etc.
The census data is provided at three diﬀerent geographic levels. You are free to use one, two or all three levels. You will ﬁnd these three datasets along with a glossary for the diﬀerent variables on Brightspace. The datasets are as follows:
This excel spreadsheet contains a description for each of the census variables, separated by theme.
· Census by small area.csv
This CSV ﬁle reports the census variables by areas of population with generally 80 to 120 dwellings. small areat were designed as the lowest level of geography for the compilation of statistics in line with data protection and generally comprise of either complete or partial neighbourhoods.
· Census by electoral area.csv
This CSV ﬁle reports the census variables by local electoral area.
· Census by county.csv
This CSV ﬁle reports the census variables by administrative county/city, i.e. county/city
councils. For most counties there is a single county council. However, some counties, such as Dublin, have a city council plus additional county councils.
Your project report should be structured as follows:
1. Introduction: (~ 1 page) Brieﬂy introduce the 2016 Census: what is it, when did it occur? Provide a description of your selected themes and discuss what was measured within each theme. Discuss what you plan to investigate and what you hope to ﬁnd.
2. Data cleaning/pre-processing: (~ 1 page) Extract the variables of interest (those relevant for your chosen themes). Structure the data in a sensible manner, giving the variables informative names. Describe any data pre-processing/formatting steps you carried out.
3. Exploratory data analysis: (~ 3 pages) Perform an initial exploratory data anal- ysis (EDA) (similar to the analysis performed in the midterm assignment). Report summary statistics and create graphical summaries. Interpret your ﬁndings.
4. Statistical analysis: (~ 3 pages) Using your ﬁndings from the EDA as a guide, investigate trends in the data and correlations between diﬀerent variables. This could include
performing hypothesis tests to compare diﬀerent counties or groups of counties
ﬁtting and training a regression model to predict the unemployment, percentage of Irish speakers, number of people renting, etc. in an area.
exploring if clusters exist within the data
An exceptional project will combine a number of these ideas or perform these tasks for many diﬀerent variables/relationships. You must describe each method you use and provide evidence that you understand how it works.
5. Conclusion: (~ 1 page) Provide an interpretation of your ﬁndings. What did you learn about the population of Ireland from this analysis? Suggest areas for further investigation.
Page count includes text, ﬁgures and equations, but not code blocks.
A rubric is also included with the project materials. This will help you assess whether your project is of a high enough standard or not. The score for :euhodt 8 anal-tit and Retzlut is separated into exploratory data analysis (EDA) and statistical analysis (SA)
I strongly recommend using Jupyter so that text, code and ﬁgures can be interleaved. If using Jupyter, you must submit your report as both a PDF and .ipynb. If you are using a diﬀerent Python IDE, you should use a text editor, such as Word or LaTeX, to write
your report and submit a PDF of the report and your Python code ( ,p- ﬁle).