闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Test

STATS 769 Data Science Practice

1. Briely explain how you would calculate a cross-validated estimate of prediction error in a linear regression. Is this estimate likely to be smaller or greater than the in-sample error？

2. (a) I it a neural network (using the nnet（） function). I then it the model again using identical code. But the residual sum of squares from the second it is diferent from that from the irst it! What is going on？ Have I made a mistake？

(b) I it a neural network using the code

nnet（y~., data = data.df, size=10）

What is the signiicance of the argument si处e=10？ If you increase the value of size, would you expect the residual sum of squares to increase or decrease？

3. Explain the diferences in the recursive partitioning algorithm for itting trees (a) when doing prediction of a continuous outcome, and (b) when doing classiication.

4. Figure 1 shows the Content of a JSON ile）''data.json'' and the following Code reads this ile into R.

> 1ibrary（二son1ite）

> crimes <– fromJSON（"data.json"）

write down what the result of the following R Code would be.

> crjmes

write down what the result of the following R Code would be.

> dim（crimes）

The following code creates a mongoDB collection from the JSON ile.

> 1ibrary（mongo1ite）

> m <- mongo（co11ection="testcrimes"）

> m$insert（crimes）

write down what the result of the following R code would be.

> m$find（query= '{ "id": 34274772 } ' ,

+ fie1ds= f { "-id": 0, "category": 1, "1ocation-type": 1, "month": 1 } '）

[

{

"category": "anti-social-behaviour",

"location-type": "Force",

"location": {

"latitude": "51.497899",

"street.id": 953525,

"longitude": "-0.119685"

"id": 34274772,

"month": "2014-07"

{

"category": "anti-social-behaviour",

"location-type": "Force",

"location": {

"latitude": "51.507309",

"street.id": 956645,

"longitude": "-0.128348"

"id": 34290854,

"month": "2014-07"

}

]

Figure 1: The JSON ile "data.二son"

5. Figure 2 shows the content of an XML ile）"data.xml".

write R code to read that ile into R and extract all donation elements where the donation amount is larger than 2ooo.

The output that your code should produce is shown below:

[[1]]

[[2]]

[[3]]

[[4]]

attr（,"class"）

[1] "XMLNodeset"

<？xml version="1.0"？>

</candidate>

</party>

</candidate>

</candidate>

</party>

</ElectoralDonations>

Figure 2: The XML ile "data.xml"

6. Figure3 shows theirst few lines of a Csvile）Ⅱdata.CsvⅡ. The complete ile has 6）ooo）ooo rows.

Estimate the amount of memory that this data set would occupy if it was read into R using the following R code (and explain your reasoning).

> data <- read.Csv（ndata.Csvn, stringsAsFaCtors=FALSE）

Describe an alternative way to work with the data set in R that would require less memory.

2000,1,28,5,1647,1647,1906,1859,HP,154,N808AW,259,252,233,7,0,ATL,PHX,1587,15,11,0 2000,1,29,6,1648,1647,1939,1859,HP,154,N653AW,291,252,239,40,1,ATL,PHX,1587,5,47,0

2000,1,30,7,NA,1647,NA,1859,HP,154,N801AW,NA,252,NA,NA,NA,ATL,PHX,1587,0,0,1

2000,1,31,1,1645,1647,1852,1859,HP,154,N806AW,247,252,226,-7,-2,ATL,PHX,1587,7,14,0 2000,1,1,6,842,846,1057,1101,HP,609,N158AW,255,255,244,-4,-4,ATL,PHX,1587,3,8,0

2000,1,2,7,849,846,1148,1101,HP,609,N656AW,299,255,267,47,3,ATL,PHX,1587,8,24,0

2000,1,3,1,844,846,1121,1101,HP,609,N803AW,277,255,244,20,-2,ATL,PHX,1587,6,27,0

2000,1,1,6,1702,1657,1912,1908,HP,611,N652AW,250,251,232,4,5,ATL,PHX,1587,5,13,0

2000,1,2,7,1658,1657,1901,1908,HP,611,N807AW,243,251,233,-7,1,ATL,PHX,1587,3,7,0

2000,1,3,1,1656,1657,1922,1908,HP,611,N807AW,266,251,241,14,-1,ATL,PHX,1587,5,20,0 2000,1,4,2,1955,1932,2230,2153,HP,613,N509DC,275,261,232,37,23,ATL,PHX,1587,5,38,0 2000,1,5,3,1934,1932,2133,2153,HP,613,N509DC,239,261,224,-20,2,ATL,PHX,1587,5,10,0 2000,1,6,4,1929,1932,2125,2153,HP,613,N303AW,236,261,220,-28,-3,ATL,PHX,1587,5,11,0 2000,1,7,5,1932,1932,2146,2153,HP,613,N173AW,254,261,237,-7,0,ATL,PHX,1587,4,13,0

2000,1,9,7,2008,1932,2221,2153,HP,613,N168AW,253,261,237,28,36,ATL,PHX,1587,4,12,0 2000,1,10,1,1926,1932,2147,2153,HP,613,N160AW,261,261,235,-6,-6,ATL,PHX,1587,7,19,0 2000,1,11,2,1932,1932,2126,2153,HP,613,N160AW,234,261,217,-27,0,ATL,PHX,1587,6,11,0 2000,1,12,3,1936,1932,2142,2153,HP,613,N322AW,246,261,227,-11,4,ATL,PHX,1587,7,12,0 2000,1,13,4,1942,1932,2153,2153,HP,613,N160AW,251,261,220,0,10,ATL,PHX,1587,5,26,0 2000,1,14,5,1932,1932,2131,2153,HP,613,N314AW,239,261,218,-22,0,ATL,PHX,1587,6,15,0

Figure 3: The irst few lines of the Csv ile "data.Csv"

7. Figure 4 shows some of the output from top on a Linux computer.

How many Cpu cores does this machine have？ How much RAM does this machine have？ How busy are the Cpu cores？ How much RAM is currently being used？

top - 10:19:02 up 38 days, 1:59, 3 users, load average: 0.00, 0.01, 0.05 Tasks: 163 total, 1 running, 162 sleeping, 0 stopped, 0 zombie

Cpu0 : 0.0xus, 0.0xsy, 0.0xni,100.0xid, 0.0xwa, 0.0xhi, 0.0xsi, 0.0xst Cpu1 : 0.3xus, 0.3xsy, 0.0xni, 99.3xid, 0.0xwa, 0.0xhi, 0.0xsi, 0.0xst Mem: 3973448k total, 2512664k used, 1460784k free, 408404k buffers

swap: 4115452k total, 125816k used, 3989636k free, 945436k cached

Figure 4: The irst few lines of output from top on a Linux machine.

8. Given the following bash commands and output ...

$ ls

2000.csv data.json data.xml~

Alan.docx data-science-test.aux full.txt

code-better.R data-science-test.log ideas.txt

code-better.R~ data-science-test.out ideas.txt~

code-efficiency data-science-test.pdf medium.txt

code.R data-science-test.Rnw sample.二son~

code.R~ data-science-test.Rnw~ sample.txt

data.csv data-science-test.tex Test779-2015.pdf

data.csv~ data.xml unused-question.Rnw

$ mkdir Temp

$ cp data-science-test.* Temp

$ cp unused-question.Rnw Temp

$ rm Temp/*.Rnw

... write down the result of the following bash command:

$ ls Temp

The contents of the ile data.xml are shown in Figure 2.

write down the result of the following bash command (and explain what the output means):

$ grep party data.xml l Wc

9. Explain what the following R Code is doing and what the output means.

> Rprof（"test.out"）

> replicate（5, mean（rnorm（1000000）））

[1] -0.0017922088 -0.0011004727 -0.0008793575 0.0017379549 0.0007257155

> Rprof（NULL）

> summaryRprof（"test.out"）

$by.self

self.time self.pct total.time total.pct

"rnorm" 0.46 100 0.46 100

$by.total

total.time total.pct self.time self.pct

"rnorm" 0.46 100 0.46 100

"FUN" 0.46 100 0.00 0

"lapply" 0.46 100 0.00 0

"mean" 0.46 100 0.00 0

"replicate" 0.46 100 0.00 0

"sapply" 0.46 100 0.00 0

10. The following code runs a simple bootstrap permutation test using 1oooo replications and measures how long it takes to run the test.

> diffs <- function（N） {

+ diffMean <- 1:N

+ for（i in diffMean）{

+ Grpsamp1e <- samp1e（Grp）

+ diffMean[i] <- diff（tapp1y（BP, Grpsamp1e, mean））

+ }

+ diffMean

+ }

> set.seed（1000）

> BP <- rnorm（10, 100, 20）

> Grp <- rep（1:2, 5）

> system.time（diffs（10000））

user system elapsed

1.204 0.000 1.207

write R code to perform the 1oooo replications in parallel on 4 cores. You can assume that the machine you are running on has at least 4 cores. Estimate how much time your code will take to run and explain your reasoning.

2023-08-22

Java

物理(Physical)

LINUX

C++

Python

Processing

sas

ios

maths

maple

C语言