MAST30027: Modern Applied Statistics Assignment 2, 2022
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
MAST30027: Modern Applied Statistics
Assignment 2, 2022.
● This assignment is worth 7% of your total mark.
● To get full marks, show your working including 1) R commands and outputs you use, 2) mathematics derivation, and 3) rigorous explanation why you reach conclusions or answers. If you just provide final answers, you will get zero mark.
● The assignment you hand in must be typed (except for math formulas), and be submitted using LMS as a single PDF document only (no other formats allowed). For math formulas, you can take a picture of them. Your answers must be clearly numbered and in the same order as the assignment questions.
● The LMS will not accept late submissions. It is your responsibility to ensure that your assignments are submitted correctly and on time, and problems with online submissions are not a valid excuse for submitting a late or incorrect version of an assignment.
● We will mark a selected set of problems. We will select problems worth > 50% of the full marks listed.
● If you need an extension, please contact the tutor coordinator before the due date with appropriate justification and supporting documents. Late assignments will only be accepted if you have obtained an extension from the tutor coordinator before the due date. Under no circumstances an assignment will be marked if solutions for it have been released. Please DO NOT email the lecturer for extension request.
● Also, please read the “Assessments” section in “Subject Overview” page of the LMS.
Note: There is no unique answer for this problem. The report for this problem should be typed. Hand-written report or report including screen-captured R codes or figures won’t be marked. An example report written by a student previous year has been posted on LMS.
Data: The dataset comes from the Fiji Fertility Survey and shows data on the number of children ever born to married women of the Indian race classified by duration since their first marriage (grouped in six categories), type of place of residence (Suva, urban, and rural), and educational level (classified in four categories: none, lower primary, upper primary, and secondary or higher). The data can be found in the file assignment2 prob1 .txt. The dataset has 70 rows representing 70 groups of families. Each row has entries for:
● duration: marriage duration of mothers in each group (years),
● residence: residence of families in each group (Suva, urban, rural),
● education: education of mothers in each group (none, lower primary, upper primary, sec- ondary+),
● nChildren: number of children ever born in each group (e.g. 4), and
● nMother: number of mothers in each group (e.g. 8).
We can summarise data as a table as follows.
> data <- read .table(file ="assignment2_prob1 .txt", header=TRUE)
> data$duration <- factor(data$duration, levels=c("0-4","5-9","10-14","15-19","20-24","25-29")
>
, ordered=TRUE)
> data$residence <- factor(data$residence, levels=c("Suva", "urban", "rural")) > data$education <- factor(data$education, levels=c("none", "lower", "upper", "sec+")) > ftable(xtabs(cbind(nChildren,nMother) ~ duration + residence + education, data))
nChildren nMother
duration residence education
0-4 |
Suva |
none lower upper sec+ |
4 24 38 37 |
8 21 42 51 |
|
urban |
none lower upper sec+ |
14 23 41 35 |
12 27 39 51 |
|
rural |
none lower upper sec+ |
60 98 104 35 |
62 102 107 47 |
5-9 |
Suva |
none lower upper sec+ |
31 80 49 38 |
10 30 24 22 |
|
urban |
none lower upper sec+ |
59 98 118 48 |
13 37 44 21 |
|
rural |
none lower upper sec+ |
171 317 200 47 |
70 117 81 21 |
10-14 |
Suva |
none lower upper sec+ |
49 99 58 24 |
12 27 20 12 |
|
urban |
none lower upper sec+ |
75 143 105 50 |
18 43 29 15 |
|
rural |
none lower upper sec+ |
364 546 197 30 |
88 132 50 9 |
15-19 |
Suva |
none lower upper sec+ |
59 153 41 11 |
14 31 13 4 |
|
urban |
none lower upper sec+ |
108 225 92 19 |
23 42 20 5 |
|
rural |
none lower upper sec+ |
577 481 135 2 |
114 86 30 1 |
20-24 |
Suva |
none lower |
118 91 |
21 18 |
upper 47 12
sec+ 13 5
urban none 118 22
lower 147 25
upper 65 13
sec+ 16 3
rural none 756 117
lower 431 68
upper 132 23
sec+ 5 2
25-29 Suva none 310 47
lower 182 27
upper 43 8
sec+ 2 1
urban none 300 46
lower 338 45
upper 98 13
sec+ 0 0
rural none 1459 195
lower 461 59
upper 58 10
sec+ 0 0
Problem: We want to determine which factors (duration, residence, education) and two-way interactions are related to the number of children per woman (fertility rate). The observed number of children ever born in each group (nChildren) depends on the number of mothers (nMother) in each group. We must take account of the difference in the number of mothers (hint: one of the lab problems shows how to handle this issue). Write a report on the analysis that should summarie the substantive conclusions and include the highlights of your analysis: for example, data visualisation, choice of model (e.g., Poisson, binomial, gamma, etc), model fitting and model selection (e.g., using AIC), diagnostic, check for overdispersion if necessary, and summary/interpretation of your final model.
At each step of you analysis, you should write why you do that and your interpretation/conclusion. For example, “I make an interaction plot to see whether there are interactions between X and Y”, show a plot, and “It seems that there are some interaction between X and Y” .
2022-09-06