Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Practice Programming Test

Applications of Big Data (301110)

October 2020

School of Computer, Data and Mathematical Sciences

Instructions

Answer as many questions as you can. To get full marks, you must answer all questions.

Your programmes should be written in python, either as jupyter notebooks or as python scripts, unless requested           otherwise. There is no other restriction on how you should organise your code and how to implement all the requirements.

Submit your code and all supporting document(s) (if any) through vUWS. If you have more than one fle, pack all fles into a zip fle and submit it. Name yourfle u301110_studentID .zip, u301110_studentID .py or                                   u301110_studentID .ipynb as appropriate.

This test is open-book. You may use any resources to answer questions, including documentation on the internet. You may not, however, use question-and-answer sites such as stackexchange.

Q.1.P Read and plot data (10 marks)

You will fnd on vuws two lists of marks for two units: 300580 Programming Fundamentals and 300581 Programming Techniques.

Write python code to:

Download the two CSV datasets 300580 .csv and 300581 .csv;

print out their contents;

plot a scatter plot of marks in 300581 (y-axis) agains 300580 (x-axis).

Q.2.P SQL statements (10 marks)

For this question, you should not write python, but rather write down your SQL queries to go into sqlite, and the results of those queries. You may write these answers as a text fle or in a jupyter notebook.

Download testdb .sqlite, which is a collection of student marks for computing units. Load the database into sqlite, examine it to understand its structure, and write SQL queries to answer the following questions. For each question, write down your SQL query and your answer. You may need to do extra calculations in addition to the SQL query. If so, write them down.

1. How many students are in the database?

2. List the unique Unit IDs in the database

3. Which unit has the lowest mark in the database?

4. Which unit has the lowest average mark in the database?

5. Which is the hardest unit covered by the database? Explain your answer.

Q.3.P Generate training and test data (10 marks)

Use the scikit-learn package’s make_moons funtion to generate two datasets for classifcation. The training        dataset should have 500 points, and the testing dataset should have 200 points. Add enough noise to make sure that there is some overlap between the ‘moons’.

Plot your training dataset. It should look something like this:

 

Q.4.P Implement a KNN classifer (10 marks)

Use scikit-learn’s KNeighborsClassifier to ft a K-nearest neighbours classifer to the training dataset. Use n_neighbors of 10.

Get the classifers score on the testing dataset built above.

Hints:

1.  You’ll need to consult the documentation to work out how to do this.

2. Beware: this code generally uses the American spelling neighbor