AD699: Data Mining for Business Analytics Summer II QUIZ #3: Question Bank
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
AD699: Data Mining for Business Analytics
Summer II
07AUG2018
Quiz 3
QUIZ #3: Question Bank
1. Which of the following statements about k-means clustering is true?
a. When performing a k-means clustering, you must specify the number of clusters that you want the model to create.
b. With k-means clustering, the number of clusters will be automatically determined by the algorithm behind the kmeans() function.
c. K-means clustering can only be used with categorical data.
d. K-means clustering is a supervised learning task.
2. What distance measure is used to determine cluster-to-cluster distance with single-linkage clustering?
a. With single-linkage clustering, the average of all the records in the two clusters is calculated first; then, a randomly-generated value determines the next cluster pairing.
b. With single-linkage clustering, the distance measure used is the minimum distance between the nearest pair of records in the two clusters.
c. Single-linkage clustering uses an algorithm that randomly generates distances, and then measures the clusters by homogeneity.
d. A single-linkage measuring criterion must be specified by the user each time.
3. In a hierarchical agglomerative clustering process involving six total records, how many clusters would the model use at first?
a. The model would start with one cluster containing all six records, and then slowly divide the records into a greater number of clusters.
b. Such a model would start with three clusters, and then either contract or expand, depending on the characteristics of the data.
c. The model would start with six clusters, and then begin to join records together into clusters. At each step, the number of clusters would be reduced.
d. The model would only begin to form once the user specified the desired number of clusters in advance.
4. In the dendrograms that we saw in class and in the textbook, the cutoff values can be seen on the .
a. x-axis
b. centroid convergence
c. y-axis
d. ordinary least squares.
5. For an undirected network with three nodes, what is the maximum possible number of edges?
a. 3.
b. 3.5
c. 7.
d. 6.
6. For a directed network with 7 nodes, how many total edges are possible?
a. 42
b. 21
c. 13
d. 7 total edges (but of varying strength).
7. Which of the following describes a bidirectional, or undirected, connection?
a. When Mary begins to follow Tim on Twitter, Tim gains one follower. He does not necessarily have to follow Mary back in return.
b. Tim can indicate through LinkedIn that Mary is a thought leader whose posts he would like to see. Mary can see that Tim has indicated this.
c. Tim can indicate through LinkedIn that Mary is a thought leader whose posts he would like to see. Mary cannot see that Tim has made this selection unless he chooses to share it publicly.
d. When Mary sends a LinkedIn connection request to Tim, and Tim accepts, each of them gains a connection. The impact to the network would be the same if Tim had initiated the connection request.
8. In social network analysis, what is a singleton?
a. A singleton is a user who is not connected to any other node.
b. A singleton is a network in which each node is directly connected by an edge to exactly one other node.
c. A singleton is used to help determine the path length between two otherwise-unconnected nodes.
d. In social network analysis, a singleton is essentially the same thing as an unweighted edge.
9. The table below shows categorical values -- a 1 in a particular cell indicates that the store carries in the item in stock, whereas a 0 indicates that the item is not stocked by that store. Given the information contained in the table below, what is the Jacquard coefficient between Boxborough and Chelmsford?
a. .33
b. .67
c. 1.33
d. .25
10. In the dendrogram shown below, how many clusters will have formed at 1.0 units of distance?
a. At 1.0 units of distance, two clusters have formed (New England-United, and Madison-Northern).
b. At 1.0 units of distance, there are four clusters (New England-United, Madison-Northern, Oklahoma-Texas, and Arizona-Southern).
c. At 1.0 units of distance, there is one total cluster.
d. At 1.0 units of distance, no clusters will have formed yet.
11. For the dendrogram shown immediately above, which of the following would be true about the number of clusters at a distance of 4.0?
a. At a cutoff distance of 4.0, NY would still standby itself, but all of the other records would be part of one cluster (two clusters total).
b. At a cutoff distance of 4.0, there would be 18 separate clusters in this model.
c. At a cutoff distance of 4.0, Nevada, Puget, and Central would all be in separate clusters.
d. At a cutoff distance of 4.0, all the records would have been formed into one large cluster.
12. Which of the following statements about the network shown below is true?
a. This network is a clique, but not a connected network.
b. This network is a clique and a connected network.
c. This network is a connected network, but not a clique.
d. This network is neither a clique nor a connected network.
13. Suppose a telephone company is wondering what happened to a particular customer named John Doe. This customer stopped paying his phone bill, and stopped responding to any correspondence from the phone company. The phone company suspects that he may have resumed phone service under a different name and address. How can the company use entity resolution to see if the mystery customer and a new customer are really the same person?
a. They could look at the calling and text network of John Doe (whose identity they know). This would tell them who John called, who called him, who he texted, who texted him, etc. They could then compare that to the call and text networks of new customers to look for a match.
b. They could build a diagram that shows each person John Doe had ever called or texted. Then, they could check to see whether any of those people had recently canceled their service with the company.
c. They could look to see whether any new subscribers had made suspicious inquiries with the telephone company. Entity resolution would then identify those suspicious individuals, and the company could look to make comparisons from there.
d. The company could use entity resolution by identifying the call records of known criminals that John Doe had spoken with or texted in the past. Then, they could adjust their model based on these patterns in order to find people that might know more about John Doe’s whereabouts.
14. As part of the preprocessing for the creation of a text-mining model, an analyst decides to use stemming. Which of the following might be accomplished in this step?
a. Common English-language words such as theirs, this, you’d, she, and what would all be removed from the document, in order to reduce it to the most essential terms.
b. This will have made the text ready for Latent Semantic Indexing (LSI).
c. The words ‘train’, ‘training’, ‘trainer’, and ‘trained’ would all be reduced to a single term, and would be treated the same by the model.
d. Most of the terms -- except those that had multiple punctuation marks -- would become de-tokenized.
15. What is a potential flaw associated with a social network graph that ignores edge weight?
a. Without a depiction of eigenvector centrality, the reader would not know whether this was an egocentric network.
b. A person reading such a graph would not be able to see any meaningful information about the relative importance of the various connections in the network.
c. Such a graph might lead a reader to believe that the network was directed, when it was actually undirected.
d. When using a graph that does not show edge weight, time components that might have value to the network will be misrepresented.
2023-12-07