Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

SPAM004: Computational SocialScience 2

Assessment 2: Project report

Overview: A 3,000 word (+/- 10%) data analysis project report.

Percentage of mark for module: 70%

Due date: Submitted via ELE before 1400 on 17/03/2026. Information on

how to submit an assignment can be found via:

http://www.exeter.ac.uk/students/infopoints/yourinfopointservices/assessments/

4.1 Structure of the report and what is re-

quired

This is the final assignment will test your ability to carry out your own

computational social science project.

As we discussed earlier in the module, the internet is now an incredibly

valuable data resource for social sciences. As such, the internet as a data

resource will be the main focus of this assignment.

You have two options for this assignment, of which you must select one

to do:

4.1.1 Option 1: Using pre-existing datasets.

Select one of the two data sets below. You must then use the Python skills

youve learned during the course of the module, to do the following:

1. Use Python to clean the dataset.

9

10 CHAPTER 4. ASSESSMENT 2: PROJECT REPORT

2. Use Python to create some visualisations of the data and to calculate

summary statistics.

3. Use Python to carry out some more advanced data science analysis;

i.e. inferential statistics, natural language processing (NLP), social

network analysis, etc. You can choose which of these more advanced

data science techniques to use, but make sure the form of analysis you

choose is appropriate for the data youre working with.

4. Write a 3,000 (+/-10%) word report.

The report should include:

1. A description of the dataset.

2. The issues you encountered with the data and the steps you took to

clean the dataset.

3. A discussion of what your data visualisations, descriptive statistics, and

advanced data science analysis shows.

4. The Python code you used in annotated form. If you decide to do your

analysis in a Jupyter notebook, you can include markdown boxes that

contain text describing what the different parts of your code do as you

go along. You can then export the Jupyter notebook as a pdf file and

attach it to your report for submission. If, instead, you decide to code

in a Python script, you can include commented lines of code explaining

what it does. It is important to note that you do not have to provide a

description for every single line of code. Instead, you can just provide

a description of what chunks of your code (i.e. a function or a for loop)

do. You can then copy and paste the script into a notepad or word

document to attached to your report. The annotation of your code

does not count towards the word count of the report.

There is no specific question to answer in this assignment per se. More-

over, this assignment is about you demonstrating your ability to carry out

a piece of data science research. In other words, use you Python and data

science skills to analyse the dataset to find the underlying patterns in the

data and to interpret them.

If you are doing this first option, you have two datasets to choose from:

4.1. STRUCTURE OF THE REPORT AND WHAT IS REQUIRED 11

Dataset 1: Political posts on Facebook and Twitter

This dataset, from Crowdflowers Data For Everyone Library, provides text

of 5000 messages from politiciansFacebook and Twitter accounts, along

with human judgments about the purpose, partisanship, and audience of the

messages.

The dataset has 7 variables:

1. id: A unique id for the message.

2. audience: Who is the message aimed at? Coded as either national or

constituency.

3. message: The category of the message in the post. Falls into one of

the following categories:

- attack: The message attacks another politician.

- constituency: The message discusses the politicians constituency.

- information: An informational message about news in government or

the wider U.S.

- media: A message about interaction with the media.

- mobilization: A message intended to mobilize supporters.

- other: A catch-all category for messages that dont fit into the other.

- personal: A personal message, usually expressing sympathy, support

or condolences, or other personal opinions.

- policy: A message about political policy.

- support: a message of political support.

4. polid: Unique id for the politician.

5. label: A a string of the form From: firstname lastname (position from

state).

6. source: The online platform where the message was posted; either

facebook or twitter.

7. text: The message text.

Dataset 2: Natural disaster tweets.

In the aftermath of a disaster, bystanders often post about what is happening

making information on social media faster and more informative than news

reports. This dataset contains 10,877 such posts from Twitter.

12 CHAPTER 4. ASSESSMENT 2: PROJECT REPORT

This dataset has 3 variables:

1. id: The posts unique id number.

2. text: The text of the tweet.

3. tweetid: The tweet id of the post.

Dataset 3: Trump professional network.

During the period of Trumps presidency, there were many news reports that

discussed who he was interacting with and the nature of his relationship with

these people. This dataset contains all that is needed to explore this social

network.

The dataset has 5 variables:

1. Source: The source node in this specific edge.

2. Target: The target node in this specific edge.

3. Weight: The weight of the edge.

4. Relationship: The nature of the relationship according to a to the

news source in the corresponding Citationcolumn.

5. Citation: A URL to the news source that provides details of the

relationship.

4.1.2 Option 2: Collecting and analysing your own in-

ternet data

You do have the option of extracting your own data from the internet or

social media platform by scraping it yourself. However, before starting your

work, you must speak to the module convener if you wish to do this first

and get their permission to do so.

Once you have collected your data and saved it as a dataset, you must

then use the Python skills youve learned during the course of the module,

to do the following:

1. Use Python to clean the dataset.

4.2. ASSESSMENT SUBMISSION AND DEADLINES 13

2. Use Python to create some visualisations of the data and to calculate

summary statistics.

3. Use Python to carry out some more advanced data science analysis;

i.e. inferential statistics, natural language processing (NLP), social

network analysis, etc. You can choose which of these more advanced

data science techniques to use, but make sure the form of analysis you

choose is appropriate for the data youre working with.

4. Write a 2,500 (+/-10%) word report.

The report should include:

1. A description of the dataset.

2. The issues you encountered with the data and the steps you took to

clean the dataset.

3. A discussion of what your data visualisations, descriptive statistics, and

advanced data science analysis shows.

4. The Python code you used in annotated form. If you decide to do your

analysis in a Jupyter notebook, you can include markdown boxes that

contain text describing what the different parts of your code do as you

go along. You can then export the Jupyter notebook as a pdf file and

attach it to your report for submission. If, instead, you decide to code

in a Python script, you can include commented lines of code explaining

what it does. It is important to note that you do not have to provide a

description for every single line of code. Instead, you can just provide

a description of what chunks of your code (i.e. a function or a for loop)

do. You can then copy and paste the script into a notepad or word

document to attached to your report. The annotation of your code

does not count towards the word count of the report.