Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

MAST20031 Analysis of Biological Data - Assignment 2

Instructions

• Assignment 2 contains 3 problems worth a total of 31 marks.

• Your assignment must be submitted to GradeScope by 11.59pm Friday 5th May.

• Assignments submitted late will incur a penalty of 1% per hour  (or part thereof).   If you have exceptional circumstances that prevent you from meeting the deadline, please email MAST20031-  .      .auinfo@unimelbedu, and we may be able to grant an extension.

• Tutors may not help you directly with assignment questions.   They may, however, provide some appropriate guidance.

• There is a discussion board mega-thread if you need clarification on the wording of questions.

General advice

• Please show your working/code.  If you show your working, we can see when you’re using the right process even if you end up with the wrong numerical answer. We like awarding marks; help us to help

you.

• We recommend using an R Notebook (we’ve included a template file) to keep your code, results and answers nicely formatted. But you can use any word processing program: in either case, your code and output from R should be part of your document.

• No graph is complete without appropriate labels, units and axes. Please do not submit hand drawn graphs. You can save or copy-paste graphs from R.

• You are encouraged to use internet resources (eg Google for“how to do X using R”) but need to submit your own work (don’t directly copy something from ChatGPT, for example).

Dexterity data

Problem 1 uses the data the class collected in Data Collection Exercise 2, available on the LMS. The data

consist of 1056 rows, with each row containing the results of a single dexterity trial.  Each trial consists of a hand (left/right) and a number of grains of rice.  Note this is NOT the same file we used for

assignment 1; please use the file associated with this assignment.

dce2  <-  read .csv (file  =  "DCE2_2023 .csv")

The rationale for collecting the data was to test the idea that manual dexterity is related to handedness. Higher values should indicate greater dexterity.

The following commands will reorganise the data in a way helpful to answering Problem 2.

#  calculate  the  average  nGrains  across  replicates  for  each  user .Hand . dominantHand  combination

dce2 .agg  <-  aggregate(   .  ~  UserID  +  Hand  +  dominantHand,  dce2,  mean)

#  Select  L  and  R  trials  and  calculate  differences

rTrials  <-  subset (dce2 .agg,  dce2 .agg$Hand== "R")

rTrials  <-  rTrials[order (rTrials$UserID),  ]

lTrials  <-  subset (dce2 .agg,  dce2 .agg$Hand== "L")

lTrials  <-  lTrials[order (lTrials$UserID),  ]

RL  <-  rTrials$nGrains-lTrials$nGrains          #  Calculates  average  difference   (right - left)

dom .hand  <-  rTrials$dominantHand                     # Dominant  hand  of  each  person

#  Creates  a  new  data  frame   (dexterity . data)  with  the  relevant  variables  calculated  above

dexterity.data  <-  data .frame(user  =  rTrials$UserID,  rHand  =  rTrials$nGrains,

lHand  =  lTrials$nGrains,  difference  =  RL,  dominantHand=dom .hand)

You will now have 165 rows of data giving within-student averages and the within-student difference between these averages.

Problem 1:  Dexterity Study Design [7 marks]

a.  [2  marks] The dexterity data was collected to see if adults in Australia show different dexterity with their dominant and non-dominant hands. Explain why this is not a random sample from that population.

b.  [3 marks] The instructions for DCE2 were deliberately brief. Re-write the instructions to improve the data collection, especially improving the consistency between the experimental conditions everyone used. You should make at least three substantial changes to improve the instructions.

c.  [2  marks] It is suggested that everyone should do the trials in the order dominant hand (dom), non-dominant hand (non), dom, non, dom, non.  Identify one strength of this approach, and one potential issue.

Problem 2:  Analysing Dexterity Differences [13 marks]

a.  [1 marks] Would we expect the variable difference (calculated in the code above) to be normally distributed? Why/why not?

b.  [2 marks] Examining the data separately for both right and left dominant hands, use an appropriate plot to examine your expectation from a. Do the data appear to be normally distributed?

c.  [4 marks] Considering only the students who are Right-hand dominant, carry out an appropriate hypothesis test at the 0.05 level of significance to test if there is a difference between their right and left hands. You should clearly state your hypotheses, test statistic, degrees of freedom (if relevant), p-value and a conclusion in the context of the question.

d.  [2 marks] Explain why it is reasonable to conduct a hypothesis test for the mean of difference for Right-hand dominant individuals, even if they are not normal.

e.  [2 marks] Considering only the students who are Left-hand dominant, carry out an appropriate hypothesis test at the 0.05 level of significance to test if there is a difference between their right and left hands. You only need to calculate a p-value and the size of the difference.

f.  [2 marks] Someone comments that the P-value for Right-hand dominant individuals is much smaller than that for Left-hand dominant, because the size of the difference is larger. Explain why this isn’t entirely correct.

Problem 3:  Testing DCE1 Data [9 marks]

In this problem you will use the ‘demographics’data that we collected from each of you right at the start of semester (DCE1), available on the LMS. The data set has 197 rows; each row contains answers to each question from one student. The file name for this dataset is DCE1_2023.csv”. There is one row for each student that responded to the quiz:PredictFinal is the mark the student nominated at the beginning of the course as their likely mark; Languages is the number of languages the student speaks; DominantHand is the student’s dominant hand.

demographics  <-  read .csv (file  =  "DCE1_2023 .csv")

a.  [4 marks] A report on the demographic data was given as follows:

“An analysis of variance found no significant differences in student optimism (as measured by predicted final grade) based on the number of languages spoken (F(3, 193) = 1.174, p = 0.32).”

(i) What were the hypotheses being tested here?

(ii)  Does this indicate there is no relationship between optimism and number of languages spoken? Explain.

b.  [5 marks] We are interested in whether there is an association between the number of languages spoken and where people are from (metropolitan or regional). Treating Languages as a categorical variable, carry out an appropriate hypothesis test. You will need to ensure that you meet the assumptions for this test. In your answer:

• state your null hypothesis and your alternative hypothesis;

•  list the assumptions and show they are met;

•  state the test statistic, its distribution, and report the p-value

•  describe in plain English what the results mean.

Organised Submission & participate in DCE2 [2 marks]

Your participation in DCE2 has already been recorded.

You only need to submit a clearly legible assignment (pages correct way up, sensible font size, etc) with pages selected for each question part to get the additional mark.