SPAM004: Computational SocialScience 2
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
SPAM004: Computational SocialScience 2
Assessment 2: Project report
Overview: A 3,000 word (+/- 10%) data analysis project report.
Percentage of mark for module: 70%
Due date: Submitted via ELE before 1400 on 17/03/2026. Information on
how to submit an assignment can be found via:
http://www.exeter.ac.uk/students/infopoints/yourinfopointservices/assessments/
4.1 Structure of the report and what is re-
quired
This is the final assignment will test your ability to carry out your own
computational social science project.
As we discussed earlier in the module, the internet is now an incredibly
valuable data resource for social sciences. As such, the internet as a data
resource will be the main focus of this assignment.
You have two options for this assignment, of which you must select one
to do:
4.1.1 Option 1: Using pre-existing datasets.
Select one of the two data sets below. You must then use the Python skills
you’ve learned during the course of the module, to do the following:
1. Use Python to clean the dataset.
9
10 CHAPTER 4. ASSESSMENT 2: PROJECT REPORT
2. Use Python to create some visualisations of the data and to calculate
summary statistics.
3. Use Python to carry out some more advanced data science analysis;
i.e. inferential statistics, natural language processing (NLP), social
network analysis, etc. You can choose which of these more advanced
data science techniques to use, but make sure the form of analysis you
choose is appropriate for the data you’re working with.
4. Write a 3,000 (+/-10%) word report.
The report should include:
1. A description of the dataset.
2. The issues you encountered with the data and the steps you took to
clean the dataset.
3. A discussion of what your data visualisations, descriptive statistics, and
advanced data science analysis shows.
4. The Python code you used in annotated form. If you decide to do your
analysis in a Jupyter notebook, you can include markdown boxes that
contain text describing what the different parts of your code do as you
go along. You can then export the Jupyter notebook as a pdf file and
attach it to your report for submission. If, instead, you decide to code
in a Python script, you can include commented lines of code explaining
what it does. It is important to note that you do not have to provide a
description for every single line of code. Instead, you can just provide
a description of what chunks of your code (i.e. a function or a for loop)
do. You can then copy and paste the script into a notepad or word
document to attached to your report. The annotation of your code
does not count towards the word count of the report.
There is no specific question to answer in this assignment per se. More-
over, this assignment is about you demonstrating your ability to carry out
a piece of data science research. In other words, use you Python and data
science skills to analyse the dataset to find the underlying patterns in the
data and to interpret them.
If you are doing this first option, you have two datasets to choose from:
4.1. STRUCTURE OF THE REPORT AND WHAT IS REQUIRED 11
Dataset 1: Political posts on Facebook and Twitter
This dataset, from Crowdflower’s Data For Everyone Library, provides text
of 5000 messages from politicians’ Facebook and Twitter accounts, along
with human judgments about the purpose, partisanship, and audience of the
messages.
The dataset has 7 variables:
1. id: A unique id for the message.
2. audience: Who is the message aimed at? Coded as either national or
constituency.
3. message: The category of the message in the post. Falls into one of
the following categories:
- attack: The message attacks another politician.
- constituency: The message discusses the politician’s constituency.
- information: An informational message about news in government or
the wider U.S.
- media: A message about interaction with the media.
- mobilization: A message intended to mobilize supporters.
- other: A catch-all category for messages that don’t fit into the other.
- personal: A personal message, usually expressing sympathy, support
or condolences, or other personal opinions.
- policy: A message about political policy.
- support: a message of political support.
4. polid: Unique id for the politician.
5. label: A a string of the form ”From: firstname lastname (position from
state)”.
6. source: The online platform where the message was posted; either
facebook or twitter.
7. text: The message text.
Dataset 2: Natural disaster tweets.
In the aftermath of a disaster, bystanders often post about what is happening
making information on social media faster and more informative than news
reports. This dataset contains 10,877 such posts from Twitter.
12 CHAPTER 4. ASSESSMENT 2: PROJECT REPORT
This dataset has 3 variables:
1. id: The post’s unique id number.
2. text: The text of the tweet.
3. tweetid: The tweet id of the post.
Dataset 3: Trump professional network.
During the period of Trump’s presidency, there were many news reports that
discussed who he was interacting with and the nature of his relationship with
these people. This dataset contains all that is needed to explore this social
network.
The dataset has 5 variables:
1. Source: The source node in this specific edge.
2. Target: The target node in this specific edge.
3. Weight: The weight of the edge.
4. Relationship: The nature of the relationship according to a to the
news source in the corresponding ’Citation’ column.
5. Citation: A URL to the news source that provides details of the
relationship.
4.1.2 Option 2: Collecting and analysing your own in-
ternet data
You do have the option of extracting your own data from the internet or
social media platform by scraping it yourself. However, before starting your
work, you must speak to the module convener if you wish to do this first
and get their permission to do so.
Once you have collected your data and saved it as a dataset, you must
then use the Python skills you’ve learned during the course of the module,
to do the following:
1. Use Python to clean the dataset.
4.2. ASSESSMENT SUBMISSION AND DEADLINES 13
2. Use Python to create some visualisations of the data and to calculate
summary statistics.
3. Use Python to carry out some more advanced data science analysis;
i.e. inferential statistics, natural language processing (NLP), social
network analysis, etc. You can choose which of these more advanced
data science techniques to use, but make sure the form of analysis you
choose is appropriate for the data you’re working with.
4. Write a 2,500 (+/-10%) word report.
The report should include:
1. A description of the dataset.
2. The issues you encountered with the data and the steps you took to
clean the dataset.
3. A discussion of what your data visualisations, descriptive statistics, and
advanced data science analysis shows.
4. The Python code you used in annotated form. If you decide to do your
analysis in a Jupyter notebook, you can include markdown boxes that
contain text describing what the different parts of your code do as you
go along. You can then export the Jupyter notebook as a pdf file and
attach it to your report for submission. If, instead, you decide to code
in a Python script, you can include commented lines of code explaining
what it does. It is important to note that you do not have to provide a
description for every single line of code. Instead, you can just provide
a description of what chunks of your code (i.e. a function or a for loop)
do. You can then copy and paste the script into a notepad or word
document to attached to your report. The annotation of your code
does not count towards the word count of the report.
2026-03-12