COMP5310 Project Stage 2


COMP5310 Project Stage 2

Experiment, Quantify, Report

Project Stage 2A: Summarise and Analyse the Data

9 October 2022, weight 10%

This stage is usually done with the same group members as you worked with for Stage 1. However if someone is currently in a group that is not in their timetabled lab, they will need to move groups to one in their timetabled lab. If this applies to you, please urgently emailclaire.hardgrove@sydney.edu.auto arrange moving to a different group.

What to submit

1.   Submit a written report on your work, as a PDF document.

2.   Submit the code and dataset that you used to produce the analysis and charts in your report.


There are two individual tasks and one tasks. The tasks should be addressed in a group report, and the report should identify which group members answered which individual questions.

Individual tasks:

1.   [4 marks] Each group member should answer one of these two sub-tasks using a different statistical technique. At least one person from the group must answer each sub-task, but more than one person can answer the same sub-task using a different statistical technique:

•   Identify a statistical technique that might be appropriate for summarisation and analysis of your dataset. For that technique:

o Name the technique

o Outline the assumptions that are required for the technique to be valid

o Describe to what extent the assumptions are true for your dataset

o Justify your choice of technique in the context of the business question.

o Use the technique to analyse some aspect of your dataset and draw conclusions

•   Identify    a    statistical    technique    that    is    clearly    not    appropriate    for summarisation and analysis of your dataset. For that technique:

o Name the technique

o Outline the assumptions that are required for the technique to be valid

o Describe what assumptions are violated in your dataset

o Justify why this technique is not appropriate for your dataset

o Propose whether the data can be transformed in a way that makes the assumptions true, and justify whether this is appropriate or not in the context of your business question.

When justifying your  conclusions,  consider for  example  whether  the  technique requires too many assumptions that are only partially true, or might make your conclusions too unreliable to apply inyour business context. Also consider the cost of making a Type I error, and the cost of a Type II error in your business context.

2.   [2 marks] Each individual should create one chart that visualises some aspect of the dataset that informs your understanding of the data and research question . Describe what conclusions you draw from the chart, and what questions it raises that you could answer in Stage 2B.

Group tasks:

3.   Answer the following questions as a group [2 marks] :

•   Describe  any  exploratory  analysis  you  have  undertaken  to  refine your understanding  of  the  data  and  research  question,  the  strengths  and limitations of the exploratory analysis you undertook compared to at least one alternative, and justification for the analysis you undertook.

•   Propose an approach (a particular classifier model, hypothesis test, etc) that you might take to solving your research question in Stage 2B, and any limitations or strengths of the approach  compared to at least one other approach, and justify your choice of approach.

•   Outline, at a high level, how you will validate the approach, the strengths and limitations of the validation techniques you chose compared to at least one alternative method, and justify your choice of validation techniques.


The  marker’s  evaluation  will  be  made  principally  on  the  basis  of  your  report;  the submitted code and data may be considered as evidence to check or clarify statements made in the report.

Note: you will not be penalized in marks ifyou explore a reasonable question about the      domain, by looking at appropriate relationships between some aspects, and then conclude that there is no clear relationship revealed.

Question 1:

[Flawed]: States the name of the technique and answers, with valid justifications, one bullet point in their particular sub-task for question 1

[Pass]: States the name of the technique and answers, with valid justifications, two bullet points in their particular sub-task for question 1

[Distinction]: States the name of the technique and answers, with valid justifications, three bullet points in their particular sub-task for question 1

[Full marks]: States the name of the technique and answers, with valid justifications, all four of the bullet points in their particular sub-task for question 1.

Question 2:

[Flawed]: A chart of some data attribute.

[Pass]: A chart of some data attribute, correctly documented encoding between data attributes and visual attributes in each chart.

[Distinction]: A chart of some data attribute, and correctly documented encoding and     other decisions (such as style of chart, scale etc), and sensible justification of the choice of encoding in view of the effectiveness of different visual attributes .

[Full marks]: A chart of some data attribute, and correctly documented encoding and      other decisions (such as style of chart, scale, etc), and sensible justification of the choice of encoding in view of the effectiveness of different visual attributes, as well as sensible conclusions from the chart/statement of the questions it raises for Project Stage 2B.

Question 3:

[Flawed]: An answer to all of the bullet points in Question 3.

[Pass]: A well-reasoned answer to all of the bullet points in Question 3, including a discussion of strengths and limitations.

[Distinction]: A well-reasoned answer to all of the bullet points in Question 3, including a discussion of strengths and limitations in comparison to an alternative for each           question respectively

[Full marks]: A well-reasoned answer to all of the bullet points in Question 3, including a discussion of strengths and limitations in comparison to an alternative, and a                      justification of your choice for each question respectively.