Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

SOCR4006/SOCR8006 Online Research Methods: Assessment 1 – Basic SNA

S2 2022

For this assignment, you can choose two options (please state clearly what option you are taking):

· Option 1: (Mainly) Using VOSON Dashboard. You are to use VOSONDash for network analysis except in question A.1, where you must use R.

· Option 2: Only Using R. You must use R for all network analysis.

For both options, you must include your R code as an annex.

Please read the questions carefully: there is some variation depending on the option taken.

This assignment involves the Twitter actor network “Debate2020_subnet.graphml” (available on Wattle). This is a subnetwork created from a larger Twitter network collected by R. Ackland and B. Gertzel during the 1st  debate in the 2020 US presidential election. The larger network was constructed from Twitter activities (tweets, retweets) that were authored during the debate, where the activities contained one of more of a set of debate and election-related hashtags (#PresidentialDebate, #PresidentialDebates, #Election2020, #Debates2020, #Debates, #DebateNight, #Biden, #Biden2020, #BidenHarris2020, #Trump, #Trump2020, #TrumpPence2020). More information on the data collection can be found in Gertzel and Ackland (2021).

The subnetwork has been created as follows:

1. The Twitter accounts for the presidential candidates were removed.

2. Eight political commentators/politicians/partisan celebrities who were prominent on Twitter during the debate (they had high indegree in the Twitter network) were identified:

◦ Democrat: @MarkRuffalo, @KyleKulinski, @RBReich, @ewarren

◦ Republican: @RealCandaceO, @ScottPresler, @DLoesch, @KimStrassel

3. For each of these Twitter users, the Twitter users who replied to, mentioned or retweeted the user during the debate were identified. We can think of these users as the “alters” for each of the “egos” (the prominent users). The alters for each ego were ranked based on their total degree (indegree+outdegree) in the larger network, and only the top-200 alters were included (this was to keep the subnetwork relatively small).

4. The subnetwork consists of the eight egos, plus their top-200 alters. Note: there were some alters who connected to more than one ego.

You must only use this dataset for your studies in this course. Do not publish or share this dataset.

Part A.

A.1 Pre-processing of the network (2 marks)

For both Options 1 and 2, you must complete this using R (provide your code in an annex).

Using “Debate2020_subnet.graphml” as your starting point, do the following:

(a) Report how many nodes and how many edges there are in this network.

(b) Report the frequency counts of the different edge types; you can optionally use a plot/figure to report the frequency counts (you will not get more marks for doing so), but it is also fine to just report the values in the text of your answer.

(c) Create a node attribute “color” with:

· Republican egos – red

· Democrat egos – blue

· Everyone else – white

(d) If you are doing Option 1, write out the graphml file, for reading into VOSONDash

A.2 What is this network? (2 marks)

(a) How many multiple edges are there? How many loops are there? Briefly explain what multiple edges and loops are. What type of tweet results in the creation of a looped edge? Report the status_id and text of one example tweet.

(b) We can interpret different Twitter edge types in different ways.  In the context of politics online, how would you interpret when person A retweets person B (why might researchers be interested in retweet networks)?  How would you interpret when person A replies to person B?

(c) [Option 1 only] Identify an academic paper referred to in the course materials and briefly explain why it is relevant to a network of retweets, and do the same for a network of replies. Note that “relevant” means either the research involved the use of such a network, or it is focused on a topic that can be investigated using such a network.

A.3 Network visualisation (3 marks)

(a) Remove the loops and multiple edges and create a visualisation of this network (include this in your answer!). Your network visualisation should be made with reference to the three “unwritten principles” of visual network analysis discussed in Venturini et al. (2021). Briefly discuss how these principles are reflected in your visualisation.

With regard to your experimentation with the layout algorithm, note that in VOSONDash, you should not choose the “DH” algorithm as it will probably freeze the session (due to the size of the network).

(b) According to Venturini et al. (2021), what is the difference between “polarization” and “clustering” in a network visualisation? Can you observe these phenomena in your network and if so, what is your interpretation or explanation?

(c) [Option 1 only] With reference to Venturini et al. (2021), to the extent that polarization can be observed in the visualisation, can this be interpreted as axes? Why or why not? Hint: Remember that with force-directed layout algorithms the initial starting position of the nodes is random and this can can affect the position of nodes and clusters in the final network visualisation.

Part B.

For the questions in Part B, use the network you created in A.3.

B.1 Network-level metrics (3 marks)

(a) Briefly describe the network using the following network-level metrics (make sure you define and explain these terms):

· density

· average geodesic distance

· inclusiveness

(b) Report and interpret the indegree and outdegree centralisation measures. Is indegree more or less centralised than outdegree? Briefly explain this finding, by reflecting on the behaviour involved with creating and receiving ties in this network.

B.2 Node-level metrics (2 marks)

(a) Include the plots of indegree and outdegree distribution, and explain the plots in words.

(b) Who are the top-3 nodes in terms of: indegree centrality, outdegree centrality, betweenness centrality? Provide a brief definition and interpretation for each of these measures i.e. what might it mean for a node to have high indegree, outdegree or betweenness centrality in this network?

B.3 Strongly-connected components (4 marks)

(a) Use a plot to describe the distribution of the number of strongly-connected components (SCC). Explain the plot in words.

(b) Provide a network map of the SCC of size greater than 1. How many SCC are in this map? How many clusters are in this map (hint: here, a cluster might consist of two SCC that are connected)?  Explain any difference in the numbers of SCC and clusters.

(c) Look at the tweets authored by the users in the giant SCC, and briefly comment on whether you think that for this dataset, the SCC appears to be an accurate way of identifying groups of users who share a common political stance.

Part C. Typology of online networks (4 marks)

Imagine you have collected and constructed the following online networks:

· Network 1 (Reddit): nodes are users and a directed edge from user i to user j indicates that i replied to comment written by j.

· Network 2 (Twitter): nodes are users and an undirected edge between user i and user j indicates that i follows j and j follows i.

· Network 3 (Twitter): nodes are users and a directed edge from user i to user j indicates that i replied to j.

· Network 4 (Twitter): bipartite network with users as node type 1 and hashtags as node type 2. A directed edge from user i to hashtag j indicates that user i included hashtag j in at least one of their tweets.

· Network 5 (Twitter): user-to-user one-mode projection constructed from Network 4.

· Network 6 (Twitter): hashtag-to-hashtag one-mode projection constructed from Network 4.

· Network 7 (Twitter): nodes are users and an undirected edge between user i and user j indicates that i and j both retweeted the same tweet within a time window of 10 seconds.

(a) Referring to Ackland and Zhu’s (2015) typology of online networks, explain in which category you would place each of the above networks. Your answer should demonstrate your understanding of the typology.

(b) Which of the two types of semantic networks discussed in Yang and Gonzalez-Bailon (2018) is Network 6? Your answer should demonstrate clear understanding of these two types of semantic network.

(c) [Option 1 only] With reference to Yang and Gonzalez-Bailon (2018), briefly describe the information loss that occurs when one-mode projections (Networks 5 and 6) are constructed. How could this information loss potentially affect a study of vaccine hesitancy or misinformation about COVID-19, using Twitter data?

 

End of questions

References

Gertzel, B. and R. Ackland (2021), “#DebateNight 2020: Hashtag Twitter Collection of the US Presidential Debates,” https://vosonlab.github.io/posts/2021-06-03-us-presidential-debates-2020-twitter-collection/. Accessed: 7 August 2022.

Ackland, R. and J. Zhu (2015), "Social Network Analysis," in P. Halfpenny and R. Procter (eds) Innovations in Digital Research Methods, SAGE Publications.

Venturini, T., Jacomy, M. and P. Jensen (2021), “What do we see when we look at networks: Visual network analysis, relational ambiguity, and force-directed layouts,” Big Data & Society, January–June: 1–16.

Yang, S.J., & González-Bailón, S. (2018). Chapter 13. Semantic Networks and Applications in Public Opinion Research. In J. Victor, A. Montgomery, & M. Lubell (Eds.), The Oxford Handbook of Political Networks (pp.327-353). New York, NY, USA: Oxford University Press.

  

See over for important instructions about this assignment

General instructions for the assessment. Please read carefully - if you do not follow an instruction (and especially take note of bolded text below), you may be asked to resubmit and/or lose marks:

· Each year at least one student asks me “is the word limit for real?” or “do you really mean it?” Yes, the word limit is real and if I discover you have exceeded the limit, I will deduct marks.

· The word count does not include tables, references, figure/table headings or the code annexe but it does include footnotes.

· If you include a lot of text in your tables (e.g. you construct a table explaining network metrics) then the table will be included in the word count.

· Please use 1.5 line spacing and at least 2cm margins on all sides, so I have more room to provide corrections/suggestions.

· Page numbers are helpful even for a small document.

· Do not re-state the question before answering it i.e. do not copy out the question text verbatim.  All you need is to indicate the question you are answering using e.g. “A.1” and then sub-questions (if relevant) in using e.g. “(a)” and in its own paragraph.

· Please make sure you use the question labels e.g. A.2 (a), (b) etc. It is annoying for the marker to have to infer what question you are answering, when you have not used the question labels to help the marker.

· Make sure you have sufficient space between answers – at least one blank line.

· Do not use a cover page or headers/footers.  There is no need for a cover page when you are submitting via Turnitin (and I don’t want to have to spend time scrolling past it) and headers/footers make it harder for me to estimate the word count.

· Referencing: if you reference other people's work please use the Harvard system.

· Network visualisations: If you are taking a screenshot, please make sure you crop the screenshot (using picture editing software, for example) so only the network map is in the picture. If you just take a screenshot of the entire window or all of VOSONDash, then it will be difficult to see the detail in the network map.

· Do not include screenshots of SNA metrics or output from either VOSONDash or from the R console. Rather, I want you to discuss the various graph metrics within the text of your assignment and you should also consider using tables for clearer presentation of graph metrics.

· Figures and Tables: these should all have titles and the figure/table number should be referred to in the text. e.g. "The centrality measures are shown in Table 1."

· If you are using colours for the nodes in a network map or node size reflects a node metric, this should be explained in the text (or you should provide a legend).

· Please make sure you discuss the network statistics in writing. Do not just provide numbers or tables without any text discussing the network statistics and their interpretation.

· You are expected to answer the questions in your own words, without excessive quoting from the lecture notes or other materials, since you are being tested on how well you understand these concepts.

 

Additional instructions for R code:

· Provide your R syntax as an Annex to your assignment.

· The R code annex should be in a fixed-width font (e.g. courier) and it should be single-spaced (the line spacing should be visually less than what you use for the rest of your report).

· You will be assessed on how well you write your R code: is it easy to understand, with useful comments etc? I will deduct marks if the code is messy or hard to follow.