ACTL1101 Introduction to Actuarial Studies
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
ACTL1101
Introduction to Actuarial Studies
Main Assignment
(Due date: 28 July, Friday 4pm)
T2 2023
1. Part One: Taxation Data by Postcode
1.1 Context
In this part of the assignment, you will perform an analysis (and create visualisations) of a dataset which contains variables mostly related to income and taxation within different Australian postcodes for tax year 2018-2019. Some information about the schools (elementary and secondary) within those postcodes is also present in the dataset (last 4 columns). Warning: because some postcodes do not have any schools in them, those last 4 variables contain many ‘NA’ values. Other variables may also contain some ‘NA’ values.
Each of your datasets consists of 800 randomly generated records and can be downloaded as the csv file. At the end of this document, you can find ”Appendix A: Variable Description,” which provides a brief description of each variable and its meaning. It is important to understand the representation and meaning of the variables used in order to interpret the results accurately.
For your information, those datasets are ‘real’ (and publicly available). The taxation and income data is available here1 (with its license found here). The school data is available here.
1.2 Your Tasks
1. (1pt) Produce a visualisation of the distribution of variable Private. Health. Proportion
across all postcodes. Briefly describe this distribution.
2. (1pt) Produce a table containing, for each State:
• the number of postcodes within that state
• the mean of variable Private. Health. Proportion (across postcodes within that state)
• the standard deviation of variable Private. Health. Proportion (across postcodes within that state)
3. (1pt) Create a new variable called Avg. Gross. Rent, which is simply:
Gross. Rent. Amt
Total. Nb .
Then, compute the sample correlation between Avg. Gross. Rent and Avg. Tax. Rate. Report and briefly interpret your result.
4. (2pts) Add a variable called Tax. Bracket to this dataset. This new variable should be based on variable Avg. Tax. Rate, and be equal to:
• ‘Low’ if Avg. Tax. Rate is below its 25% quantile.
• ‘Medium’ if Avg. Tax. Rate is equal or above its 25% and below its 75% quantile.
• ‘High’ if Avg. Tax. Rate is equal or above its 75% and below its 99% quantile.
• ‘Very High’ if Avg. Tax. Rate is equal or above its 99% quantile.
Then, report the average Avg. Income within each Tax. Bracket.
Hint: consider using the R function quantile().
5. (2pts) Produce a visualisation which illustrates the relationship between variable
Private. Health. Proportion and the new variable Tax. Bracket. Briefly discuss what you observe.
6. (3pts) Open Question: use any variable(s) you want in this dataset to tell a brief story about the data. This can be anything you find relevant, but you must include at least one visualisation to support your ‘story’ . Example: an interesting/surprising link between variables, an insight that could help set a new public policy (or improve an existing one), a finding that is the starting point for new research, etc.
2. Part Two: Optimal Investments
2.1 Context
In this second part of the assignment (which is totally unrelated to the first part), you will work on an investment problem that would be difficult to tackle without programming. You need to use the R Shiny app to look for the values that correspond to your zID. The app will provide the numerical values of r , µ0 , and w0 specific to your zID. Please use the app to retrieve these values and incorporate them into your analysis. The context is as follows.
You want to invest your money and you have two investment Options. Both of them will yield a random rate of return. However, Option B is substantially riskier than Option A. Their dynamic is as follows:
• An amount of 1 invested in Option A will yield a random amount A, with
A = 1 + U · r, (2.1)
and where U is a Uniform(0,1) random variable andr is a constant.
• An amount of 1 invested in Option B will yield a random amount B, with
B = exp (µ0 +^0.12 — c2 Z + cΦ− 1 (U)), (2.2)
and where U is the same Uniform(0,1) as in Equation (2.1), Z is a N(0,1) random variable (independent of U), Φ − 1 () is the quantile function of the standard Normal distribution and µ0 ,c are constants (with 0 三 c 三 0.1).
You make financial decisions using a utility function given by:
v(w) = 1 — exp( —w), for w e R,
and you invest all your wealth w0 in some proportion to Option A, and in some proportion to Option B. Call γ the proportion of your wealth you invest in Option B (where 0 三 γ 三 1). Your final (and random) wealth W is then equal to:
W = w0 [(1 — γ)A + γB] .
Note 1: Don’t worry about function Φ − 1 (), simply know that you can get this function in R as qnorm(). So, to get Φ − 1 (U) you would write qnorm(U).
Note 2: We stretch that the same U is used in the calculation of Option A and Option B. This induces a correlation between A and B .
2.2 Your Tasks
1. (3pt) Create a function called generate. AB. This function has two arguments: n (no default value) and c (default value of 0). This function does the following:
• It generates n random pairs of (A, B) under the dynamic given by Equations (2.1)-(2.2), with constant c specified via the second argument ‘c’ of the function.
• It returns this sample as a matrix of size n × 2 (n rows, 2 columns). The first column is the sample (A1 ,..., An ); the second column is the sample (B1 ,..., Bn ).
Then, use your function to create a scatter plot of a sample (A1 , B1 ), . . . , (An , Bn ) for n = 2000 and c = 0.08 (the A values should be on the x axis, while the B values should be on they axis) and briefly comment on the relationship between A and B .
Hint: In this question, vectorization is your friend. Remember that a single command like rnorm(n) can generate a vector of n random variables in one go.
2. (2pt) Use a visualisation of your choice to illustrate the relationship between:
• the correlation ρ(A, B).
• the constant c.
Then, briefly analyse and interpret this relationship.
Hint: We do not know the theoretical correlation between A and B , but we can use function generate. AB() to obtain a sample from (A, B) and then use the R function cor() to estimate the correlation, from that sample.
3. (3pt) Following this investment strategy, your Expected Utility is:
E[v(W)] = E[1 − exp(−W)] = E[1 − exp( − w0 (1 − γ)A − w0γB)].
Assume that c = 0.05. Find the γ which maximises E[v(W)]. [Recall that 0 ≤ γ ≤ 1.]
Hint 1: It would be hard to compute E[v(W)] with pen and paper, but here again you can use generate. AB() to obtain a sample v(W1 ),..., v(Wn ). You can then compute the mean of that sample, which is a good estimate of E[v(W)]. You can repeat that for many values of γ, as to find the (approximate) γ that maximises Expected Utility.
Hint 2: The ‘computational burden’ here is quite high, as you need a fairly large sample (we suggest n > 200,000) to approximate E[v(W)] by the sample mean of v(W). That said, in this process you only need to generate ONE sample (A1 , B1 ), . . . , (An , Bn ). Indeed, using this ONE sample you can then obtain samples v(W1 ),..., v(Wn ) for different values of γ . Said otherwise, you do not need to generate a new random sample for every different value of γ .
4. (1pt) Assume that c = 0.05. If you follow this investment strategy with the γ value as derived in the previous question, what is: Pr[W < w0] (i.e. the probability you experience a loss)? [Here again, we expect an approximate answer based on R simulations, not a pen-and-paper calculation.]
5. (1pt) Include in an Appendix the pseudocode of all the tasks your performed in R/Python for questions 1 to 4. [You are encouraged to write your pseudo-code before the actual implementation in R/Python, as this helps structure the coding process and organise your thoughts.]
3. Format Requirements
• You must submit your assignment on Turnitin (under the ”Main Assignment” section in Moodle).
• You must submit two files:
– A .pdf file: contains your answers to all questions.
– A . R/py file: contains all the R or Python code you used to produce your answers.
• About the . pdf file:
– It includes a title page with your student name and student zID.
– The page format is A4 (the standard Australian format).
– The minimum font size used is equivalent to ”Times New Roman” size 11.
– The minimum line spacing used is 1.15.
– The margins should not be narrower than the ”narrow” option in Word (0.5 inches on every side).
– The answers (including sub-parts) are numbered in the same way they are numbered in the statement of the questions.
– Your answers to Part One (including plots) must fit on 2 pages.
– Your answers to Part Two (including plots) must fit on 1 page.
– The main body of the pdf (i.e., the 3 pages) need not contain R or Python code but must include everything that is asked in the questions (e.g., visualizations,
tables, numbers, explanations, etc.).
– All R or Python code necessary to produce your results must be placed in an Appendix to your . pdf file. There is no page limit on this Appendix, but the efficiency of your code will be graded (see Marking Criteria). To be clear: we want the entirety of your R or Python code to be present in your Appendix (as well as in your . R/py file).
– This R or Python code Appendix must be made of text (not images).
• Specifically about your . R/py file:
– Your R or Python code must run as it is (and produce exactly the results in your assignment). If we cannot run your code, you will lose ALL marks associated with the R or Python code (C1 and C2 in the Marking Criteria).
– Your R or Python code must contain ALL steps necessary to answer the questions in this assignment. To be specific: you are NOT allowed to do any data manipulation in Excel or any software other than R or Python.
4. Marking Criteria
Each individual Question is allocated a fixed number of marks. To assess your answers, we will use a series of criteria. Those criteria are stated below, with a brief description that corresponds to a ‘HD mark’. Not all criteria are relevant to every sub-question: find a detailed mapping below.
• C1: Code Correctness: Your codes, functions and algorithms produce exactly the desired results, and do not produce any irrelevant/superfluous results.
• C2: Code Efficiency: Your codes are extremely efficient, without sacrificing readability. Your R codes are extremely well organised and easy to follow.
• C3: Analysis: Your analysis is insightful and accurate. Your interpretation of your results is correct, clear, precise and shows a great depth of understanding and critical thinking. Your writing is concise, fluent and devoid of typos, grammatical and syntactical mistakes.
• C4: Choice of Visualisation: Your choice of which visualisation to use is excellent: it conveys all (and only) the appropriate information.
• C5: Presentation: The formatting and presentation of your results and/or visualisations is impeccable: clear, readable and aesthetic.
• C6: Pseudocode: Your pseudocode is clearly written in a neutral syntax and contains all steps necessary to reproduce your results.
4.1 Part One
For each sub question in Part One, the relevant marking criteria are:
• Q1: C1 (20%), C3 (30%), C4 (30%) C5 (20%)
• Q2: C1 (50%), C2 (30%), C5 (20%)
• Q3: C1 (60%), C3 (40%)
• Q4: C1 (60%), C2 (40%)
• Q5: C1 (20%), C3 (30%), C4 (30%) C5 (20%)
• Q6: C3 (50%), C4 (25%), C5 (25%)
4.2 Part Two
For each sub question in Part Two, the relevant marking criteria are:
• Q1: C1 (40%), C2 (20%), C3 (20%), C5 (20%)
• Q2: C1 (20%), C2 (20%), C3 (20%), C4 (20%), C5 (20%)
• Q3: C1 (60%), C2 (40%)
• Q4: C1 (60%), C2 (40%)
• Q5: C6 (100%)
4.3 Plagiarism Awareness
This is an individual assignment. While we have no problem with students discussing assignment problems if they wish, the material each student submits must be their own individual work. Students should make sure they understand what plagiarism is.
In particular, any R or Python code you present must be from your own computer, and developed by you alone. With ≈ 360 students performing the same task, some small elements of code are likely to be similar. However, big patches of identical code (even with different variable names, layout, or comments) will be considered suspicious and investigated for plagiarism. Turnitin picks this up easily, so cases of plagiarism have a very high probability of being discovered. The best strategy to avoid any problem is to never share bits and pieces of code with other students.
4.4 Late Penalties
Penalties for late assignments are as indicated in the course outline:
Late submission will incur a penalty of 5% per day or part thereof (including weekends) from the due date and time. An assessment will not be accepted after 5 days (120 hours) of the original deadline unless special consideration has been approved.
Hence, be careful: 4.99 days of lateness gives you a penalty of 25%, but 5.01 days of lateness gives you a penalty of 100%.
5. Answering Students’ Questions
Questions or clarification about the assignment must be posted on the Ed Forum. We do not plan to give out many additional hints, but if we were to do so, we want everyone to benefit from them.
Important Note: The deadline for submission of this assignment is 28 July at 16:00. However, we will stop answering any questions about the assignment on Wednesday 26 July at 16:00. The rationale for this is twofold:
• we want to incentivise students to start the assignment early
• we want to be fair to assiduous students who decide to submit their assignment ahead of time. Were we to give hints right before the deadline, those students would be penalised for their earliness.
A. Variable Description for Part One
• Postcode: Identifier of an Australian postcode (for which all other variables are recorded)
• State: State in Australia where the Postcode is located
• Total. Nb: Total number of individuals (with tax returns) in that postcode
• Total. Income: Total taxable income (all sources) for that postcode
• Net. Tax. Amt: Total net tax paid
• Avg. HELP: Average studentHECS-HELP Debt repayment
• Avg. Salary: Average salary or wages
• Total. Income. Amt: Total income or loss (including non taxable income)
• Avg. Income: Average total income or loss
• Avg. Tax: Average tax paid
• Avg. Tax. Rate: Average tax rate paid (Net. Tax. Amt divided by Total. Income. Amt)
• Avg. Work. Expenses: Average work related expenses (all expenses)
• Employer. Super. Contributions. Amt: Total reportableemployer superannuation contributions
• Net. Capital. Gain. Amt: Total net capital gain
• Tax. Net. Capital. Gain. Amt: Total estimated tax on net capital gains
• Avg. Foreign. Income: Average assessable foreign source income
• Gross. Rent. Amt: Total gross rent (income)
• Personal. Super. Contributions. Amt: Total personal superannuation contributions made
• Total. Business. Income. Amt: Total business income
• Total. Business. Exp. Amt: Total business expenses
• Net. Business. Income. Amt: Total net income or loss from business
• Business. Net. Tax. Amt: Total estimated business net tax
• Private. Health. Proportion: Number of individuals with private health insurance divided by Total. Nb
• ICSEA: AverageIndex of Community Socio-Educational Advantage (see link for details)
• LBOTE: Average proportion of students with a ‘Language Background other than English’
• Indigenous: Average proportion of indigenous students
• Teaching. Ratio: Average number of teaching staff per student
2023-07-29