Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

 

STAT40800 Data Programming with Python (online)

Data Analysis Project

Autumn 2021/22

 

For this project you will study the 2016 Irish Census data. The census is a detailed account of every person living in Ireland. A census is carried out every 5 years by the Central Statis- tics Office (CSO).

The 2016 Census was broken down into 15 themes, each studying a different component of society. Within each theme there are a number of variables/statistics, e.g. within Theme .l sey. age and maruial tuauzt there is a variable for population aged 20-24, population aged 25-29, population aged 30-34, etc.

For this project you will study 2-4 of the census themes.   You are free to choose which themes to study. You must analyse the data within your chosen themes and use the data to make inferences about the population of Ireland. For example, is the age profile significantly different in Dublin compared to Kerry? Or can we predict the level of unemployment in an area based on based on the age demographic, level of education, gender ratio, etc.

The census data is provided at three different geographic levels. You are free to use one, two or all three levels. You will find these three datasets along with a glossary for the different variables on Brightspace. The datasets are as follows:

· Glossary.xlsx

This excel spreadsheet contains a description for each of the census variables, separated by theme.

· Census by small area.csv

This CSV file reports the census variables by areas of population with generally 80 to 120 dwellings.  small areat were designed as the lowest level of geography for the compilation of statistics in line with data protection and generally comprise of either complete or partial neighbourhoods.

· Census by electoral area.csv

This CSV file reports the census variables by local electoral area.

· Census by county.csv

This CSV file reports the census variables by administrative county/city, i.e. county/city


councils.  For most counties there is a single county council.  However, some counties, such as Dublin, have a city council plus additional county councils.

 

Your project report should be structured as follows:

1. Introduction:  (~ 1 page) Briefly introduce the 2016 Census: what is it, when did it occur? Provide a description of your selected themes and discuss what was measured within each theme. Discuss what you plan to investigate and what you hope to find.

2. Data cleaning/pre-processing:  (~ 1 page) Extract the variables of interest (those relevant for your chosen themes). Structure the data in a sensible manner, giving the variables informative names.  Describe any data pre-processing/formatting steps you carried out.

3. Exploratory data analysis:  (~ 3 pages) Perform an initial exploratory data anal- ysis (EDA) (similar to the analysis performed in the midterm assignment).  Report summary statistics and create graphical summaries. Interpret your findings.

4.  Statistical  analysis:   (~ 3 pages) Using your findings from the EDA as a guide, investigate trends in the data and correlations between different variables. This could include

performing hypothesis tests to compare different counties or groups of counties

fitting and training a regression model to predict the unemployment, percentage of Irish speakers, number of people renting, etc. in an area.

exploring if clusters exist within the data


An exceptional project will combine a number of these ideas or perform these tasks for many different variables/relationships.  You must describe each method you use and provide evidence that you understand how it works.

5.  Conclusion:  (~ 1 page) Provide an interpretation of your findings.  What did you learn about the population of Ireland from this analysis?  Suggest areas for further investigation.

Page count includes text, figures and equations, but not code blocks.

 

A rubric is also included with the project materials.   This will help you assess whether your project is of a high enough standard or not.  The score for :euhodt 8 anal-tit and Retzlut is separated into exploratory data analysis (EDA) and statistical analysis (SA)

I strongly recommend using Jupyter so that text, code and figures can be interleaved.  If using Jupyter, you must submit your report as both a PDF and .ipynb. If you are using a different Python IDE, you should use a text editor, such as Word or LaTeX, to write

your report and submit a PDF of the report and your Python code ( ,p- file).