闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Practice Programming Test

Applications of Big Data (301110)

October 2020

School of Computer, Data and Mathematical Sciences

Instructions

Answer as many questions as you can. To get full marks, you must answer all questions.

Your programmes should be written in python, either as jupyter notebooks or as python scripts, unless requested otherwise. There is no other restriction on how you should organise your code and how to implement all the requirements.

Submit your code and all supporting document(s) (if any) through vUWS. If you have more than one fle, pack all fles into a zip fle and submit it. Name yourfle u301110_studentID .zip, u301110_studentID .py or u301110_studentID .ipynb as appropriate.

This test is open-book. You may use any resources to answer questions, including documentation on the internet. You may not, however, use question-and-answer sites such as stackexchange.

Q.1.P Read and plot data (10 marks)

You will fnd on vuws two lists of marks for two units: 300580 Programming Fundamentals and 300581 Programming Techniques.

Write python code to:

Download the two CSV datasets 300580 .csv and 300581 .csv;

print out their contents;

plot a scatter plot of marks in 300581 (y-axis) agains 300580 (x-axis).

Q.2.P SQL statements (10 marks)

For this question, you should not write python, but rather write down your SQL queries to go into sqlite, and the results of those queries. You may write these answers as a text fle or in a jupyter notebook.

Download testdb .sqlite, which is a collection of student marks for computing units. Load the database into sqlite, examine it to understand its structure, and write SQL queries to answer the following questions. For each question, write down your SQL query and your answer. You may need to do extra calculations in addition to the SQL query. If so, write them down.

1. How many students are in the database?

2. List the unique Unit IDs in the database

3. Which unit has the lowest mark in the database?

4. Which unit has the lowest average mark in the database?

5. Which is the hardest unit covered by the database? Explain your answer.

Q.3.P Generate training and test data (10 marks)

Use the scikit-learn package’s make_moons funtion to generate two datasets for classifcation. The training dataset should have 500 points, and the testing dataset should have 200 points. Add enough noise to make sure that there is some overlap between the ‘moons’.

Plot your training dataset. It should look something like this: