Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

MA1691 | Project 2 (2021/22).

1 Project background

This project builds upon the work done in the second half of Term 2, where we learned the mathematics underlying the Google PageRank algorithm. Most of the key learning materials needed to implement this project can be found on the Blackboard, in the folder for Project 2.

More specifically

• Lecture 0 (Week 23) explained the motivation for ranking nodes on a network and outlined the imaginary computer simulation that underpins the PageRank algorithm. During the Lecture 1 (Week 24) we introduced the key concepts needed to represent networks mathematically: adjacency matrices, as well as in- and out-degrees. We learned how to handle adjacency matrices in Matlab during Lab Session 1 (Week 24).

• The simulation that motivates the PageRank algorithm describes surfers who browse a network according to some set rules. The simulation of individual surfers is possible, but very computationally inefficient, so direct simulation needs to be replaced by studying the probabilities for surfers to be on every particular node within the network, and studying how these probabilities change from iteration to iteration. Seminar Session 1 (Week 24) was concerned with hands-on calculations of these probabilities.

• In Lecture 2 (Week 25) we showed how the effect of iterations on probabilities can be reproduced by multiplying probabilities vector by the conditional probability matrix. Lab Session 2 (Week 24) was devoted to computing this matrix from a given adjacency matrix, see ❝♠❛t✶✳♠ in Lab Session 2 (Week 24). This allowed us to reproduce the effect of simulations more easily, see ✐t❡r❛t✐♦♥s✶✳♠ in Lab Session 2 (Week 25).

• Experimentation with script ✐t❡r❛t✐♦♥s✶✳♠ in Lab Session 2 (Week 26) showed that probabilities become stable after a few iterations of the PageRank algorithm. Lecture 3 (Week 26) argues why this ought to be expected for many networks of interest and shows how to reduce the problem of finding the limiting probabilities to the solution of a system of linear equations. This transformation was exemplified for a tiny network considered during Seminar Session 3 (Week 25); a general computer implementation of the same algorithm was covered in Lab Session 3 (Week 26). As the result, we wrote the script ❞✐r❡❝t✶✳♠ which takes the adjacency matrix and determines the limiting probabilities for surfers to be on any particular node. The nodes are then ranked in the order of decreasing limiting probabilities (higher probabilities mean higher rankings).

• Unfortunately, the simple simulation described in the Lecture 0 (Week 23) that motivated the PageRank algorithm does not lead to useful results for some networks. We illustrated the potential difficulties during Lecture 4 (Week 27) and also proposed generalized surfing rules that address the issue. We shown that these new surfing rules only affect

“Rank”

1

2

“Matrices”

3 “Vectors”

4

5

“Discrete probability”

“Determinants” 6

Network 1 (for students with even u).

“Eigenvalues”

1

2

“Determinants”

3 “Matrices”

4

“Vectors”

5

“Discrete probability”

“Systems of linear equations” 6

Network 2 (for students with odd u).

elements of the conditional probabilities matrix and implement the generalized version of the matrix during Lab Session 4 (Week 27).

• Last, but not least, we also need examples of networks to study, so Seminar Session 2 (Week 25) was devoted to the examples of nodes and edges that would be appropriate to a network showing relationships between mathematical topics.

Let u and v be the last two digits of your student number (as read from left to right). The tasks in Parts 1 and 2 of the project will use these digits to personalize your instructions.

In the first group of tasks you will need to compute and analyse the ranking of nodes for a pre-defined network, describing the dependencies between 6 mathematical topics. Each node represents a topic, and each edge represents the dependence. The direction of the edge goes from topic A to topic B if topic B underlies or is employed in some way in topic A; for instance the edge from “Systems of Linear Equations” to “Matrices” indicates that any analysis of systems of linear equations can be performed by studying the corresponding system matrices. If u (the penultimate digit of your student number) is odd, you will be working with the Network 1, otherwise, you will be working with the Network 2, see the diagrams above.

Task 1(a): Create script t❛s❦✶❛✳♠, based upon script ❞✐r❡❝t✶✳♠, to compute the vector of limiting probabilities for your network. These are the probabilities that a random surfer following the PageRank algorithm will visit each node. The PageRank will then rank the node with the largest probability first, the node with the second largest probability second, etc (10%).

Task 1(b): Write a short section of your report to describe your findings. It must include the mathematical definition of your adjacency matrix (typed using the equation editor), as well as a brief statement of what changes to ❞✐r❡❝t✶✳♠ were actually needed. Also write a paragraph to state what are the top three ranked nodes for your network, what are the values of their limiting probabilities and their ranks, and why do you think the associated topics got the highest ranks (10%).

For the next group of tasks you will need to add 3–5 nodes to your network, as well as at least 7 more edges. The resulting network should contain not fewer than 9 and not more than 11 topics. The following are some suitable examples of mathematical topics that can become new nodes, as well as possible dependencies between them:

• Systems of linear equations and matrix algebra: the latter can be used to solve the former, so if nodes systems of linear equations and matrix algebra are included in your network, there would be a directed edge pointing from systems of linear equations to matrix algebra.

• Infinite series −→ limits: the theory of limits is needed to evaluate infinite series.

• Definite integrals ←− area formulas in geometry: definite integrals can be used to calculate the area under a curve. Is there a link in the opposite direction?

• There is a dependence between discrete distributions (in probability) and infinite series: the moments of a discrete distribution are often calculated using an infinite series. Hence, if nodes discrete distributions and infinite series are included in your network, there would be a directed edge pointing from discrete distributions to infinite series.

Also check out the discussion of suitable topics during Seminar Session 2 (Week 24). You may use one of these examples in your network, but not more than one. These are tasks that you need to perform for the newly created network:

Task 2(a): Prepare a graphical representation of your network, with each node numbered and labelled by a mathematical topic. Do not forget to clearly indicate which nodes and edges have already been there, and which ones you contributed. People usually draw their networks using Microsoft Word or Microsoft Powerpoint, but you are free to use any other software if you prefer (10%).

Task 2(b): For each edge that you added to the network to connect a pair of nodes, provide a brief justification in your report (1-2 sentences) of why do you believe these nodes must be connected (20%).

Task 2(c): Create script t❛s❦✷❝✳♠ based upon your earlier created script t❛s❦✶❛✳♠ to compute limiting probabilities for your extended network. Try running it. Write a few sentences in your report to briefly describe the results you obtain (10%).

Depending on the configuration of the network you just created, your script from Task 2(c) may fail to report the limiting probabilities (or it may give you warnings about the matrix ▼). If this is the case, write one or two sentences to explain why this happens for your network.

The tasks in Part 2 are about applying the extended version of the PageRank algorithm to your network and comparing the results with your observations from Task 2(a–b).

Task 3(a): Create script t❛s❦✸❛✳♠, based upon script t❛s❦✷❛✳♠, to compute the vector of limiting probabilities for your network using an extended version of the PageRank algorithm. This will require you to modify t❛s❦✶❛✳♠ to use function ❝♠❛t✷✭✮ instead of ❝♠❛t✶✭✮. Use the value of α = 0.13 + 0.0v, where v is the last digit of your student number (10%).

Task 3(b): Add a brief section to your report to describe the top three ranked topics for your modified network, what are their limiting probabilities and ranks, and how these results are different (if at all) from your results in Task 2(c). For full marks, get your program to print out top three PageRank values in descending order (10%).

The remaining two tasks are slightly harder.

Task 4(a): Create Matlab script t❛s❦✹❛✳♠ based upon script t❛s❦✸❛✳♠, to generate a plot showing how limiting probabilities depend on the parameter α (α should change between 0 and 1). Make sure all your modifications are properly commented in the script (10%).

Task 4(b): Write a section in your report that includes the plot obtained during Task 4(a). Explain what specific modifications to t❛s❦✸❛✳♠ were needed to generate such a plot and also describe how limiting probabilities change for α ∈ [0, 1] (10%).

4 Project summary

Overall, the key deliverables for this project are as follows:

• The report that contains answers to Tasks 1(b), 2(b), 3(b) and 4(b), which should be submitted as a .PDF file. The overall report should not be longer than 3 sides of A4 (assuming 2cm margins, at least 11pt font and 1.5 line spacing).

• The network diagram as specified in Task 2(a), which should be submitted either as a part of the report, or as a separate .PDF document.

• All Matlab scripts that were produced to answer Tasks 1(a), 2(c), 3(a) and 4(a), which should be submitted within a single .ZIP file. If you have incomplete answers to some of the tasks, please submit incomplete solutions as well, because partial marks are often possible. Use your student ID as the name of the .ZIP file.