闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Assignment 4

STATS380, Statistical Computing

Due 10/18/2022

This assignment carries on from assignment 3, using articles.csv. Complete the following programming tasks, including comments and demonstrations that show how your code works.

1. Consider the refactoring plan you made for assignment 3. This should have included vectorizing your code (if not that way already), and putting it into a function. Create such a function, where the output is a data frame with blank titles ﬁlled in based on the URL. Run the function.

2. Create the following vector to use for deduplication: paste together the year, the title without spaces, capitalization, or punctuation, and the ﬁrst 6 characters of the author ﬁeld (also stripped of capitals, punctuation and spaces). We will refer to this as the dedup key.

3. Use the vector created in (2) and table() to create a vector of the number of times each dedup key appears.

4. Process your table names to create a year vector corresponding to each table entry (the year of that publication). What choice have I made for you in part 2 that makes this easier?

5. Use tapply() to compute the average amount of duplication for each year of publication. Make a plot of year vs this average.

6. Finally, use the function duplicated() in base R and your deduplication key to ﬁnd which rows are duplicate entries. Remove these from the data frame, and write out the rows (including any repopulated titles) to ’deduplicated.csv’). How many rows are in this ﬁle?

2022-10-13

Java

物理(Physical)

LINUX

C++

Python

Processing

sas

ios

maths

maple

C语言