Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit


Data Summary and Analysis Assignment
FMHU5002 Introductory Biostatistics
Due Date and Time: Sunday, March 22nd, 2026 11:59 PM Sydney Time
Assignment Category: Submitted Work
Assignment Sub-category: Assignment
Weighting: 20%
Plagiarism and Academic Dishonesty Policy
You must complete your assignment alone. Submitting assignments that have been jointly completed is not acceptable. Copying someone else’s work or quoting from text without adequate attribution of the source is plagiarism and is not acceptable. All assignments will be verified by plagiarism detection software. Serious penalties apply for plagiarism, collusion, or contract cheating.

Information about the University’s policy on academic honesty can be found at the following site: https://www.sydney.edu.au/students/academic-integrity.html

Late Penalties and Special Consideration
Unless you have an approved simple extension, special consideration or an academic plan, 1 mark
(5% of 20) will be deducted from your assignment mark per day (or part thereof) until Wednesday,
April 1st, 11:59 PM (Sydney Local Time). Assignments submitted past this date without approved
special consideration or an academic plan will not be accepted and will be given a zero (0) mark. For
students seeking simple extensions or special consideration, please use the following site: https://www.sydney.edu.au/students/special-consideration.html
Instructions to students
The data for the assignment has been derived from the online Introductory Biostatistics Questionnaire that students completed during Lecture 1 (and the week that followed). We had 192 responses – thank you! For this assignment each student has been allocated their own version of the dataset, comprising a random sample of respondents. A few modifications to the data have been made to make this data suitable for the Assignment.

Datasets are named BIOQA###.csv, where ### is a 3-digit number. Please use the dataset that is listed against your name in the Assignment Dataset Allocation file. For example, if your allocated dataset is BIOQA012, then the dataset you need to use is BIOQA012.csv. Please ensure you use your allocated dataset. All datasets are available within the FMHU5002 Assignment Datasets folder, under Assessment Resources.

Important Notes
  • This assignment paper (including cover page, instructions, and data dictionary) is six (6) pages in length. Please ensure you have all pages.
  • Please ensure you use your allocated dataset. Because the datasets differ between students, the results will differ, and the conclusions drawn could also vary.
  • The variable names and coding of the variables (i.e., the data dictionary) for your dataset are included at the end of this assignment on page 5 and 6.
  • There is not always just one correct way of handling data: you are sometimes required to use your own judgment. When this occurs, you should justify the decision you have made.
  • Name your submission file with your student number (SID), unit of study code, and BIOQA dataset number (e.g., 311275249_FMHU5002_BIOQA012.pdf). Ensure all pages are numbered, and that your student number is included in the header or footer of the document.
  • Assignments are marked anonymously, so please do NOT put your name anywhere on the assignment or submission title.
  • Any jamovi output presented must be edited to comply with the recommendations for presenting results as covered in the Module 1 Notes.
  • All tables and plots presented in your answers must conform to the recommended presentation guidelines outlined in the Module 1 notes. Penalties will apply where they do not conform.

Submit your assessment as a single file in .docx or .pdf format by 11:59 PM Sydney Local Time on Sunday 22nd March 2026 via Canvas (Assessments overview > Assessment 1: Data Summary and Analysis Assignment > Assignment Submission – Click Here to Submit Your Assignment > Select the file to upload and then click “Submit Assignment”). Do not attach a jamovi .omv or a .csv file with your submission.

If you have any administrative questions, please post them on the Canvas Discussion Board. Go to Discussions > Assessment 1: Data Summary and Analysis Assignment Discussion Board Alternatively contact the teaching team: [email protected]

If you have difficulties submitting the assignment around the due time, please email [email protected] directly with your assignment attached to avoid late penalties. The timestamp of your email will be used as evidence of the date and time of your assignment submission. Please note responses to emails will only occur during business hours on standard working days.

Assignment Questions

In this assignment, you will be analysing data collected from the 2026 FMHU5002 Introductory Biostatistics student survey. Students were provided with a link during the Module 1 Lecture (live, online, and in the recording) and reminded via notifications on Canvas.

Question 1 (2 marks)

Data screening is an important first step in any data analysis. Using appropriate methods, examine the variables height and weight for possible erroneous values.

i) Describe any outlier (extreme), implausible, or impossible values within these variables by providing the ID, value, and clearly indicating whether it should be considered an outlier, implausible, or impossible value.

ii) Describe the corrective action, if any, that should be taken for each of the identified observations. Perform these corrective actions.

You should use your cleaned data (i.e., the data with any edits performed) for the remainder of the assignment.

You can safely assume that the remaining variables are error-free and can be used as provided. 

Question 2 (4 marks)

A person’s VO2max is a measure of the maximum amount of oxygen the body can use during intense exercise. It is a common fitness indicator which can be estimated from age and resting heart rate.

For each student in your sample, calculate their VO2max (V02max) using the following formula: 02 = 15.3 × (220 − )ℎ

Display the distribution of V02max using an appropriate plot. In no more than two sentences, describe the important features of this distribution including relevant summary statistics.

Question 3 (6 marks)

A daily step count of 10000 steps/day has often been considered an unofficial target for improved health, however recent research has shown that just 7000 steps/day is associated with lower risk across many health outcomes (Ding et al., Lancet Public Health, 2025;10(8):E668-E681).

Create a new variable called step_countCat which groups the variable step_count into three categories: “Less than 7000 steps/day”; “Between 7000 – 10000 steps/day”; or “More than 10000 steps/day”. Note, for any students in your sample who do not own a device which tracks step count (i.e., with step_device = 2), their value of step_countCat should be treated as missing.

i) Construct a frequency table using this newly created variable step_countCat. Include in the table the relative frequencies and, if appropriate, cumulative relative frequencies for each level of step_countCat. Among students in your sample who own a device which tracks step count, what proportion do fewer than 10,000 steps/day on average?

ii) Construct a two-way table using the newly created variable step_countCat and employment. Include in the table the relative frequencies for step count category within each level of employment. In no more than three sentences, summarise what patterns are evident from the table.
Question 4 (3 marks)

Produce an appropriate plot which visually displays the relationship between a student’s self- reported sex (sex) and their smoking status (smoke). In no more than three sentences, summarise what your plot shows.

Question 5 (3 marks) 

In nearly all quantitative research outputs (e.g., publications, reports), authors will typically provide a table that describes the key characteristics of the sample, which often appears as the first table (Table 1) of the document. Create a “Table 1” that provides appropriate descriptive statistics for the variables age, sex, degree, study_mode, and distance for the sample included in your dataset.

Note: Page 22 of the FMHU5002 Course Notes provides one example of a ‘Table 1’. You should also refer to the Tutorial 1 resources, and the published literature in your field of expertise for examples; most quantitative studies will provide a ‘Table 1’.

General formatting and presentation (2 marks)

A total of 2 marks is allocated to the general formatting and presentation of display items, such as tables and figures, including confirming to the recommended presentation guidelines outlined in the

Module 1 notes, and providing only relevant information and output as part of the submission.

Total = 20 marks

This is the end of the assignment questions.

The variable names and survey questions used for the data are on the following page.