关键词 > Python代写

MSc Data Science

发布时间：2022-02-22

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

MSc Data Science

Instruction to submit your work

Please write your student ID in the first cell of the notebook. You should submit the Jupyter notebook containing the code with its output for all the questions. The code must run on Python 3.x, and remember to include all the packages you may used. Make a separate cell for each point a), b), c), etc...of each question.

At the beginning of each cell state the question number and sub-letter you are answering. Ifthe question needs a writing answer please answer it on a cell in markdown mode; please remember to clearly state the question number you are answering.

You must submit in ELE

In ELE submit a zipf file including:

• the source file with extension .ipynb.

• PDF copy of your notebook.

Markers will not be able to give feedback if you do not submit the PDF version of your code and marks will be deducted if you fail to do so. Markers will not be able to give feedback if you do not submit the notebook version of your code and marks will be deducted if you fail to do so.

Marking criteria

Work will be marked against the following criteria. Although it varies a bit from question to question they all have approximately equal weight.

• Does your algorithm correctly solve the problem? In most of the questions the required code has been described, but not always in complete detail and some decisions are left to you.

• Is the code syntactically correct? Is your program a legal Python program regardless of whether it implements the algorithm?

• Is the code beautiful or ugly? Is the implementation clear and efficient or is it unclear and extremely inefficient (e.g. it takes more than a few minutes to execute)? Is the code well structured? Have you made good use of functions?

• Is the Plot beautiful or ugly? Does the plot show a clear results? Are the labels, legends clear? Is it readable, and does it convey the message you want to stress?

• Is the code well laid out and commented? Is there a comment describing what the code does? Have you used space to make the code clear to human readers?

• In case of written question: Is your answer corrected and well-structured? Is it written in good English? does it convey the correct solution/ analysis for the question?

There are 10% penalties for:

• Not submitting the PDF or Notebook version of your programs.

• Not creating functions as instructed in the questions. • Not stating correctly the question you are answering.

Expected output

For some of the question in this coursework the expected output is showed at the end of the Problem questions. For other questions expected output can be found in the exercises proposed during the ECMM447 Labs. Please note that the expected output shown is only informative, and in your analysis you might find different results.

Questions:

1. Analyze a Network

(1.a) Load the Network.

Please load the network form the edge-list file: infect-dublin.edges as Gx. This is the edges of an undirected binary network of contact. You can find the file infect-dublin.edges on Ele page ofthe course ECMM466.

Note: This dataset contains the daily dynamic contact networks collected during the InfectiousSocioPatterns event that took place at the Science Gallery in Dublin, Ireland, during the artscience exhibition INFECTIOUS: STAY AWAY. From:

L. Isella et al., What’s in a crowd? Analysis offace-to-face behavioral networks,Journal of Theoretical Biology 271 (2011).

(1.b) Plot the Network Adjacency Matrix.

(1.c) Plot the Degree distribution.

(1.d) Plot the Degree sequence. Using the powerlaw package on python, plot the degree sequence and check if has a powerlaw distribution.

(1.e) Is the Network you are analyzing Assortative or Disassortative?

Please motivate your answer in write. (You can use plots and/or mathematical evaluation to support your answer). (40 points)

(2.a) Centralities.

Explain the degree centrality, and the eigenvector centrality. What are the differences between the two? Provide one practical example for the degree centrality and one for the eigenvector centrality. When is useful to apply each of two metrics. Motivate your answers.

(2.b.i) Closeness Centrality.

Code the function centrality_closeness(),that inputs the network and returns the closeness centrality of each nodes. (Do not use the function built in closeness centrality function in networkx or other python packages, you have to code it by yourself. You can use the the shortest path function already implemented in any other python packages different from networkx.)

(2.b.ii) Closeness Centrality.

Using Gx, compare the result of your metrics centrality_closeness() with the betweenness centrality, with a scatter plot on the x closeness centrality measure and y the betweenness centrality one. Is there any correlation? Why? explain your answer.

(2.c) Centrality Measure.

Please consider now the network Gx loaded in question 1.a.. Imagine that Gx is now representing a social network, similar to Facebook, Instagram or WeChat. You have to advertise a product and you have budget to hire only one influencer (i.e. a node in the network Gx). Using the metrics within the centrality metrics we saw during the lectures ECMM466 and the Labs, suggest the influencer you will hire (as Node id). Motivate your decision. Finally plot the network having the nodes color coded with their centrality measure and with the node selected as influence colored in dark green.

(2.d) Influencer Ego network .

Define a function Draw_ego_network that inputs a network G (a networkx network), a network layout (as list of nodes position), and a node id and the maximum distance of interaction (as int). The function Draw_ego_network will output the network plot of G outlining (with a different color and bigger size) the node selected, and the ego network of the node id coloring in different color the links and the nodes depending on the interaction distance.

Test your function with the network Gx, the node id of the influencer selected in the question 2.c, a network layout of your choice (that will improve the readability of the plot) and interaction distance 2. (See the expected output at the end.)

(2.e) Community Detection. Decide and declare a community algorithm of your choice among the ones we saw during the lectures ECMM466. Imagine that G is now representing a topological map of distances between the nodes. A link between any pair of nodes means that you can travel between the pair of nodes in a hour. If a link is not present you can not travel between the nodes. You are been hired by a delivery agency to select the perfect location for its new two garages. The delivery agency wants to start its business serving the two biggest community of the network Gx based on the algorithm you decided. Where are you suggesting to build the two garages (node id)? Which metric are you using to motivate you answer? Why? Plot the sub-network of the two selected communities, with the metric you selected, outlining with different color and size the nodes you proposed as garages’ locations. (As possible output of the plot see the labs.)

(2.f) Adjacency matrix of the Community Detection.

Define a function plot_adj_comm() that inputs a network G, a community detection metrics and outputs the adjacency matrix reordered depending on the clusters membership and cluster size (from little cluster to the bigger cluster). Plots the adjacency matrix using the same community detection algorithm you used in question 2.e. Graphically outline all the clusters.

3. SI Model

In this section you have to create your functions and you cannot use any pre-written code or python libraries that perform SI/SIR/SIS simulation. If you will use python libraries that implement SI/SIR models or similar you will have zero point.

(Note: sampling matters. To run 100 simulations a pc should not take more than 3 minutes. Please select a correct amount of simulations to validate your analysis.)

(3.a) SI Model.

Define the function SI_model() that inputs:

• G (Network as networkx)

• initial_infecteds (as list of nodes ID)

• beta (trasmission probability as float)

• t_simulation (simulation iteration time as int)

Output: A dictionary that contains the nodes status at each time steps. In this case the status can be:

• ’S’ as susceptible.

• ’I’ as Infected.

Using the network Gx and beta=0.01, t_simulation=300, initial_infecteds=(list of 3

random nodes) plot the number of user in each status at each time step of the simulation.

(3.b) SIR Model.

It has been discovered that the all the infected after a given recovery time became immune. If a node is immune and it is in contact with the infected node it does not became infected, and it will not spread the infection! Define the function SIR_model() that inputs:

• G (a Network as networkx)

• initial_infected (nodes that are infected at time=0 as list of nodes ID)

• beta (trasmission probability as float)

• t_simulation (simulation iteration time as int)

• recovery_time (nodes that are vaccinated as list of nodes ID)

• the number of days needed for the recovery.

Output: a dictionary that contains the nodes status at each time steps. In this case the status can be:

• ’S’ as susceptible.

• ’I’ as Infected.

• ’R’ as Recovered.

Using the network Gx and beta=0.005, t_simulation=300, initial_infected=(list of 3 random nodes), recovery_time=15, plot the number ofusers in each status at each simulation time step.

(3.c) Second Wave.

After 100 days a second waves of the same infection start to spread on your network. Starting from the day 100: 30 people get infected randomly in your network. Keep fixed the simulation propose by the SIR Model in question 3.b. Randomly add at day 100, 30 new infections. Ifa person is infected by the second wave this set of rules applied:

• if the person was ’S’ it became infected.

• if the person was ’I’ you need to set to zero the recovery time (it will have to wait 15 days to became immune).

• ’R’ remain recovered and it is immune to the second wave.

This new infection is more easy to transmit and it has a beta=0.02!

(3.d) Remarks.

Plot all the simulation together and make your final remarks on the simulation result of SI, SIR and Second Wave. How does the second wave affect your population? Please make your consideration in writing.

Expected output

Figure 1: Expected visual output Q. 2.d (in some extent)

Figure 2: Expected visual output Q. 2.f adjacency matrix resorted with clustering analysis (in some extent)

Figure 3: Expected visual output for Q. 1.e (in some extent)

Figure 4: Expected visual output Q. 3.b (in some extent note it is missing the second wave)