STAT0022: ICA 4 Instructions Term 2, 2022-23


1  Introduction

Please carefully read and understand these instructions before you begin the ICA.

The deadline to submit ICA 4 is 3rd May 2023 at 15:00 GMT. The goal of the assessment is for you to apply some of the statistical methods you have learned in this module on a given dataset, and write a short report describing your analysis and conclusions.  Please note that your report should

contain sections with headings as described in Section 2 below. ICA 4 makes up 50% of your module mark for STAT0022.

This is an individual project, therefore the work you are going to submit must be entirely your own. You are not allowed to discuss your work with other students, as this would amount to collusion (see also Section 9).

2  Information on the dataset

The dataset is available on Moodle under the“Assessment>ICA4” tab.  It contains information on different emails, in particular on whether they were spam mails.  Every line represents an email. There are 58 features recorded for each email. Every line has 59 columns. The Id column identifies uniquely each email. Columns 2 to 58 record the different features of the email. Column 59 indicates if the email is spam (with a‘1’) or not (with a‘0’). There are no missing data, except 20 entries in the Spam column whose purpose is explained in Section 3.4.

3  Structure

You should structure your submission according to the headings below. For your report you should only use the data from the given dataset.

3.1 Setting up the research question [2 marks]

The first step in any research is to set up the“right”questions. Based on the dataset,

• set up clearly a question you would like an answer to. [1 mark]

• Discuss why you think the given dataset is appropriate to find an answer to this question.  [1 mark]

3.2 Summarizing the data [7 marks]

Now that you have set up your question, it is time to start using statistical tools to find an answer.

• Specify which variables from the dataset you are going to use to perform your statistical analysis [2 marks].

• Include summary statistics of at least two variables you are going to use. Include at least one graph [2 marks].

• Comment on at least three statistics you have found, and indicate why these values are relevant for your study [3 marks].

3.3 Methodology [8 marks]

In performing your analysis you will need to use at least one statistical method we saw in STAT22.

• Indicate why you are using this method to answer your question [1 marks].

• Justify why you can use this method with this dataset [3 marks].

• Discuss the results of the method in light of your question [4 marks].

In this Subsection, other than at least one method from STAT22, you are also allowed to use other statistical methods we have not seen in the course. You do not need to explain them but you must refer to the source where you learnt them from. Discussing methods not seen in the course will not lead automatically to higher marks (see also end of Section 6).

3.4 Final results and prediction [9 marks]

• Combine the results obtained in the previous Subsections to give an answer to your initial question [4 marks].

• In the dataset you will find that 20 entries in the“Spam”column are marked by“???”. Give a prediction on those values explaining how you made it [5 marks]. You can use the results you obtained in the previous sections for this. Points will not be awarded based on the number of entries you correctly guess, but on the correct use and sensible discussion of statistical methods which lead to this prediction.

3.5  General [6 marks]      

Marks will be given to students who:

• correctly follow the submission format instructions; [2 marks]

• respond to the requested tasks specifically, and without giving unnecessary information; [2 marks]

• provide a coherent submission that is well presented with accurate and precise use of the English language. [2 marks]

4  Submission format

You should submit a single file, saved as a pdf and named as “ICA 4 [your student number]”. Your name must not appear in the file. Only pdf’s are allowed for the upload. You should submit a pdf file which can be checked by anti-plagiarism software: therefore do not photograph/scan your report or store it into image form, as this will convert it into a format which the software cannot read. Failure to store the report in text form may result in marks being deducted.

5  Use of statistical software

You are allowed to use statistical software, for example Stata. Name the software you used. You can include computer output in your report (though you are encouraged to summarise it in a table of your own construction). You do not need to include the code that you wrote.

6  Submission length

You must not exceed the maximum between 1500 words and 3 A4 pages (eg.  it is fine to write 2 pages with 1500 words, but 3 pages with 1550 words will be penalized). The font size must be no less than 11 pt Arial and margin no less than 2 cm.  Footnotes count towards the word count and must also be no less than 11 pt Arial. Any plots or diagrams you choose to include may be inserted at the end of the file, on at most three pages which are additional to the max {1500 words, 3 pages} count, with each plot or diagram clearly labelled and referenced from within the main discussion text.  Details of any references (if necessary) may be included on a single page, additional to the max {1500 words, 3 pages} allowed for your discussion and the three pages allowed for plots or dia- grams.

A penalty of 10 percentage points, or one Letter Grade, will be applied to those who exceed the max {1500 words, 3 pages} word count.  Any such penalty will not reduce a mark below the pass

mark of 40%.

The permitted length is an upper limit, not a guide for how much you are expected to submit. If you can clearly explain your understanding more concisely then shorter submissions will not automati- cally be marked lower. Also do not feel obliged to use everything you learnt in the module. Quality, which means using appropriate statistical methods, interpreting the results correctly and discussing them sensibly is much more important than quantity in this setting.

7  Submission procedure and deadline

You must complete your submission via the“ICA 4: please submit your completed project here” in the STAT0022 course Moodle page before the deadline of 15:00 (UK time) on 3rd May 2023. There are standard non-negotiable penalties for late submissions which you can read about in the UCL Academic Manual. Any extension to the deadline can only be granted where a student has a Summary of Reasonable Adjustments (SoRA) or has successfully claimed extenuating circumstances. Extenuating circumstances are handled by your parent department and not by the teaching department.

8  Technical failure

As you have a number of weeks or months to complete coursework, technical issues will not be considered as valid grounds for missing the deadline. All work must be submitted through the assess- ment platform; you must not submit work via email or any other channel. Students reporting technical difficulties should contact the central IT services Help & Supportresources.

9  Plagiarism, collusion and referencing

Every student completing the submission agrees to having read and understood the“Plagiarism guide- lines”document within the“Assessments”section of the STAT0022 course Moodle page. References to any source should be included using your choice of a standard referencing system. Submissions will be run through Moodle Assignment.

By clicking the “Submit”button you are agreeing to the following declaration:“I am aware of the UCL Statistical Science Department’s plagiarism guidelines.  I have read the guidelines and I understand what constitutes plagiarism and collusion. I hereby affirm that the work I am submitting for this in-course assessment is entirely my own”.

10  Queries

Any queries about ICA 4 should be posted on the Moodle Forum ICA 4 which closes on April 26, 9:00 GMT. Emails should only be used for matters that cannot be shared on the forum eg.  due to privacy issues.