闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Assignment #3 STA437H1S/2005H1S

due Friday March 17, 2023

Instructions: Solutions to problems 1–3 are to be submitted on Quercus (PDF ﬁles only).

1. Consider the single and complete linkage clustering methods. Given disjoint clusters A, B, and C, we can (for each of the two methods) deﬁne a distance measure d(A, B) between clusters A and B .

(a) For single linkage clustering, show that

d(A, B u C) = min{d(A, B), d(A, C)} = d(A, B) + d(A, C) - |d(A, B) - d(A, C)|

(b) For complete linkage clustering, show that

d(A, B u C) = max{d(A, B), d(A, C)} = d(A, B) + d(A, C) + |d(A, B) - d(A, C)|

2. In Assignments #1 and #2, you analyzed data on two species of rock crabs using pairwise scatterplots and principal components analysis.

As before, the data are in a ﬁle crabs .txt on Quercus; the columns of the ﬁle are species (B or O), sex (M or F), index (1-50 within each species-sex combination), width of the frontal lip (LP), the rear width of the shell (RW), length along the midline of the shell (CL), the maximum width of the shell (CW), and the body depth (BD).

In this problem, we will assume that we do not have at our disposal the sex/species identiﬁers for the 200 crabs and use normal mixture models to try to identify clusters in the data. To this end, you will use the function EM (available on Quercus as EM .txt) to estimate the parameters of a two component multivariate normal mixture model. (EM can be very slow so it would be wise to have other activities planned while it runs!)

As before, the data can be read into R as follows:

> x <- scan("crabs .txt",skip=1,what=list("c","c",0,0,0,0,0,0))

> FL <- x[[4]]

> RW <- x[[5]]

> CL <- x[[6]]

> CW <- x[[7]]

> BD <- x[[8]]

> y <- cbind(FL,RW,CL,CW,BD)

(a) Start by doing 30 iterations of the EM algorithm:

> r30 <- EM(y,k=2,em .iter=30)

The component r$cluster will contain the estimated cluster (either 1 or 2) for each obser- vation. These can be seen on the pairwise scatterplot as follows:

> colour <- rep("blue",200)

> colour[r30$cluster==2] <- "red"

> pairs(y,col=colour)

Do the clusters estimated after 30 iterations seem reasonable?

(b) Repeat the procedure in part (a) now using 100 iterations of EM:

> r100 <- EM(y,k=2,em .iter=100)

Comment on the diﬀerence the estimated clusters here and those from part (a).

(c) [Optional but recommended] Repeat the procedure above estimating four clusters; you will probably need to signiﬁcantly more than 100 iterations of the EM algorithm. How do the clusters compare to the sex/species groupings?

3. Suppose that S is a symmetric positive deﬁnite p x p matrix with S = VΛVT where Λ is a diagonal matrix with elements λl > λ2 > . . . > λp and V is an orthogonal matrix with columns ● l , . . . , ●p .

(a) Suppose that we approximate S by Ψ + LLT where

L =╱λl(l)/2● l λ2(l)/2● 2 . . . λr(l)/2●r 、

and Ψ is a diagonal matrix with the diagonal elements of Ψ + LLT equal to those of S . If éij is the (i, j)-element of L, give an expression for the i-th diagonal element of Ψ, ψii .

(b) Deﬁne D = Ψ + LLT - S . If {dij } are the elements of D, show that

p p

dij(2) s λr(2)＋l + . . . + λp(2) .

i=l j=l

(Hint: Note that the diagonal elements of D are 0 so that we can consider LLT - S, which can be expressed in terms of ●r＋l , . . . , ●p and λr＋l , . . . , λp .

2023-03-17

Java

物理(Physical)

LINUX

C++

Python

Processing

sas

ios

maths

maple

C语言