Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Stat 3202 Spring 2023 Lab 5

Lab 5: Plotting and Optimizing the Likelihood

Due on Carmen Friday, February 24 before 11:59 pm. All uploads must be .pdf. Submissions will be accepted for 24 hours past this deadline, with a deduction of 1% per hour. Absolutely no submissions will be accepted after this grace period.

Problems will be graded for a combination of correctness and completion.

Academic Integrity

Acceptable for this assignment:

• Working together in a small group

•  Working together in a Zoom chat

•  Discussing or explaining your approach with students who have also already attempted a problem

•  Discussing the assignment with your TA or using MSLC tutoring resources

Asking me for help in class, office hours, or through Zoom

Not acceptable for this assignment:

•  Copying answers from another student or letting another student copy your answers

•  Posting problems to online or communal forums such as a group chat, Chegg, or Stack Exchange

• Using solutions from any previous course or section of Stat 3202

Grading Rubric

This lab is worth 20 points. This credit will be earned by:

•  Correct format (PDF) and naming convention (lab5_lastnamedotnumber.pdf) for the file: 2 points

•  Overall professionalism, including professional grammar, complete and punctuated sentences:  2 points

•  Remaining 16 points under construction

Total: 20 points

Guided Problem 1

Think back to our Poisson likelihood example from last week, where we coded the likelihood function for

a data set of Poisson-distributed random variables.  Recall the Poisson PMF has a very particular form:

P (X = x) = λxx(e)λ  .  So if our data follow a different distribution, the likelihood function will look very

Instead, for this problem, we’ll focus on another population. We’ll plot a likelihood based on normal data.

a) First, write and simplify the likelihood function for a data set of normally distributed random variables. To make this first example a little simpler, suppose we have a known variance of σ 2  = 1. Use LaTeX math typesetting notation to make your likelihood render correctly.

b) Next, consider observations from a normal distribution. To make things simpler, let σ 2  = 1 be a known parameter. Code up a function L_Normal, similar to the L_Poisson from our classroom lectures, that will take an input vector of possible µs (a“µ domain”), an input vector of observed data, and compute and output the evaluated likelihood value. Be cautious with parentheses.

c) Suppose we observe a data set of weights, in grams, of frog specimens collected in the Olentangy River basin:

c (4.57 ,  4.73 ,  2.24 ,  4.16 ,  4.53 ,  3.79 ,  2.48 ,  4.74 ,  3.18)

Suppose frog masses can be assumed to be normally distributed with a known variance of 1 gram squared. Compute  = , which we will prove in lecture our next lecture is the MLE for µ .

frogs  <-  c (4.57 ,  4.73 ,  2.24 ,  4.16 ,  4.53 ,  3.79 ,  2.48 ,  4.74 ,  3.18)

print(frogs)

##  [1]  4 .57  4 .73  2 .24  4 .16  4 .53  3 .79  2 .48  4 .74  3 .18

d) Plot the likelihood function for this particular data set over the µ domain [0, 7]. Comment on the shape of the likelihood function and the apparent location of the maximum. Label all axes appropriately and professionally.

e) Mathematically, it appears that  is the MLE for µ, and we can see the maximum of the likelihood function appears to be near µ = 3.824, which we expected.  Using R, compute the maximum of the likelihood function using computer science optimization.

f) Interpret each value of the output. Specifically, what are $maximum, $objective?

g) Slightly alter your function from part b) to create a log-likelihood function, which returns the natural logarithm of the likelihood instead of the regular likelihood.  Plot the log-likelihood over the same domain you used in part d).

h) Similar to part e, optimize the log-likelihood function.

i) Comment: why are the shapes of the likelihood function and the log-likelihood function different, but the apparent locations of the maximum (the optimum value for the parameter) are the same?

Problem 2

Next on your own you’ll plot the likelihood based on exponential data.

a) First, write and simplify the likelihood function for a data set of exponentially distributed random variables. Use LaTeX math typesetting notation to make your likelihood render correctly.

b) Next, consider observations from an exponential distribution where λ is a the parameter. Code up a function L_Exponential, similar to the L_Poisson from class and L_Normal from problem 1, that will take an input vector of possible λs, an input vector of observed data, and compute and output the evaluated likelihood value. Be cautious with parentheses.

c) Suppose we observe a data set of heavy metal concentrations, in micrograms of mercury per liter, of water specimens collected in the Olentangy River basin: . Suppose concentrations can be assumed to be exponentially distributed. Compute , the MLE for λ .

mercury  <-c (0.51 ,  0.02 ,  0.15 ,  0.46 ,  0.11 ,  0.04 ,  0.39 ,0.52 ,  0.2 ,  0.17 ,  0.01 ,  0.02 ,  0.32 ,  1.37)

d) Plot the likelihood function for this particular data set over the λ domain (0, 10].  Comment on the shape of the likelihood function and the apparent location of the maximum.

e)  Mathematically, you’ll prove on homework 3 that  1   is the MLE for λ, and we can see the maximum of the likelihood function appears to be near  1  1    = 3.2634033, which we expected.   Using R and  optimize(), compute the maximum of the likelihood function from a computer science standpoint.

f) Plot and optimize the log-likelihood function as well. Where is the location of the maximum?

Problem 3

The Olentangy River can either remain at a safe water level, or crest to flood stage. This status is assumed to follow a Bernoulli distribution with parameter p, representing the probability the river is at a safe level.

Over 30 days, the river’s height is measured, yielding the following data:

c ( "Safe" ,  "Safe" ,

"Safe" ,  "Safe" ,  "Safe" ,  "Safe" ,  "Safe" ,  "Safe" ,  "Safe" ,  "Flood" ,  "Safe" ,  "Safe" ,  "Safe" ,

"Flood" ,  "Safe" ,  "Safe" ,  "Flood" ,  "Flood" ,  "Safe" ,  "Safe" ,  "Flood" ,  "Safe" ,  "Safe" ,

"Safe" ,  "Safe" ,  "Safe" ,  "Safe" ,  "Safe" ,  "Flood" ,  "Safe")

Similarly to problems 2 and 3, code up and plot the likelihood function, and use optimize() to obtain a maximum likelihood estimate for p. Comment on whether your answer seems reasonable. Plot and optimize the log-likelihood function as well. Where is the location of the maximum?