关键词 > DataScience

Data Science Project

发布时间:2024-05-14

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Data Science Project

1    Assessment objective

The purpose of this assessment is to test the following learning outcomes:  (1) demonstrat- ing knowledge of a broader range of analytical techniques used in the field of Security and Crime Science, (2) performing data science analyses on crime and/or- security-related issues, (3) applying the data science pipeline on crime and/or-security related issues, (4) interpreting and effectively reporting the results of said techniques.

Weight for the final grade: 70%

Page limit: 8 pages in the anonymised ACL long-paper format (see below). Please make sure you do not change the template in any way by increasing or decreasing the font.

This assessment is the capstone project of the module. It requires you to address a re- search problem in the full data science workflow (e.g., collecting the data, processing the data, building machine learning models, reporting on the findings and interpreting the outcomes). You will write a report in a research paper format on your project (a template will be provided), and you have to submit the R code needed to reproduce your findings. After passing this assessment, you will have demonstrated the skills to solve a problem using data science techniques.

2 Project topic

For this assessment, you will go through a full data science process to address research questions you have about a topic of your own choice. Your project should (a) be related to crime and/or security, (b) make use of all three areas taught in the module: web data collection (text data), text mining and machine learning, and (c) be reproducible with your code supplement and data.

In previous years some students worked on topics related to:

.  analysing crime and security-related discussions on Reddit

. popularity analysis (e.g. what makes a post popular) on Reddit

. exploring crime coverage patterns in newspapers

You are allowed to work on similar topics and develop research questions that predict pop- ularity or identify and analyse crime/security-related topics and patterns. Creative ways of addressing aproblem and originality are highly valued in this assessment. We strongly encourage you to make additional reading and recommend you to look at some of the relevant natural language processing conferences such as ACL-IJCNLP and NAACL in- cluding workshops to help you identify topics of interests.

Your work should not be a replication of previous studies by others.

3    Project description submission, feedback and ethics

This is a large project and to help you in the process we require you to submit a brief project description form with your questions and we will provide you with feedback to assist you in the process. This process will also help us to identify if the project has any ethical implications. The project description form is available on Moodle.

Nevertheless, werecommend submitting your project description to receive feedback.

3.1 Collecting data

You are required to collect your own text data for this project. You will need to take into account the terms and conditions of the websites you intend to use for data collection. We recommend that you use APIs to collect the data and avoid web scraping, as scraping data requires ethics approval.

Please make sure you speak to us before you commit to data collection.

4 Submission

Your submission needs to contain your report in a format of a conference paper and a link to your R code. You will submit the paper via Moodle as a pdf file.

4.1 The paper

For this assessment, you are asked to report your findings in the form of a long paper. Your paper should include the following sections:

. An abstract describing a brief summary of the research project and key outcome.

. An introduction section describing the problem/research question, why it is anim- portant/relevant question, how it relates to prior work, and your approach to ad-dressing the problem.

. A methods section describing the data science process: what you did, how you did it and why you did it this way.

. A results section presenting findings with appropriate evaluation metrics.

. A discussion and limitations section, critically analysing the findings and limitations of the project.

. A conclusions section describing the outcome of the project and a discussion of future work.

Specifically, you should use the template of the proceedings (these can be downloaded here for Latex and Word or can be imported into Overleaf here).  Additional requirements for the report:

. use the ACL style guidelines (it is easiest to use the templates)

. the page limit is eight content pages (excluding additional pages for references and an optional appendix).

. the R code notebook does not count towards the word count or the page count

. use the ACL referencing style (this is available in reference managers like Zotero or handled directly in Overleaf) - i.e. adhere to their font type, font size and heading guidelines

. use the anonymous submission version which contains line numbers

. the paper must contain only your examination number in the author line

. include a footnote with an anonymised view-only link to your code on the OSF (see above)

. make sure to only submit a pdf file

4.2 The code supplement and data

In addition to your paper, we want to check that your findings are reproducible and require you to submit your code and data. Submit your R code in the form of a commented R Notebook. To ensure that no code is lost and that we can review all code equally, submit your code as an anonymised view-only version on the Open Science Framework. Create a private repository, upload your code as an R Notebook and create an anonymised, view- only link that you include in your paper (for a guide on creating that link, see here). For details on reporting your code as an R Notebook, you can consult these guides: guide 1 and guide 2.

Submit the raw collected data in a suitable format (e.g.  a .csv or an .RData file) and include this in your repository.

5 Grading criteria

Requirements to be graded:

. submit the paper and code supplement before the deadline

Criterion

Meaning

Weight

Originality of

The degree to which the student demonstrates insight and

20%

project

can use an innovative approach to address the research problem

Research questions

The quality of the research question and the data collection

15%

and data

strategy

Quality of the data

The degree to which the techniques used in this project are

30%

science techniques

appropriate to answer the research question and are utilised and interpreted properly

Clarity and Quality

The degree to which the R code is well-documented,

10 %

of the R code

reproducible (with provided data) and correct.

Discussion of

The quality of and insight demonstrated with the

10%

limitations

"Limitations" section of the report

Clarity and

The clarity, layout, formatting and overall quality of the

15 %

presentation of the

paper

written paper