COMP6714 ASSIGNMENT 1
Consider the following pseudo code which performs list intersection based on the divideand-conquer paradigm. Note that the input lists are not necessarily sorted.
Algorithm 1: Intersect(A, B)
1 if . . . then
/* Deal with the boundary case */
2 . . . ;
3 return . . . ;
4 else
/* Recursively break down each list into two parts and recurse */
5 . . . ;
6 return . . . ;
(1) Complete the above pseudo code. You can assume that you can invoke the following member methods on a List object L:
• L:len returns the length of the list L.
You can also use the usually indexing and slicing operation on the list (as in python).
(2) Think of a method to divide each input list into k sub-lists (k ≥ 2) without changing the main logic of the algorithm you implemented in the first part. You should be able to describe the only change succinctly.
(1) Show that if the logarithmic merge strategy is used, it will result in at most dlog2 te sub-indexes.
(2) Prove that the total I/O cost of the logarithmic merge is O(t · M · log2 t).
relevant documents. Assume that there are 8 relevant documents in total in the collection.
R R N N N N N N R N R N N N R N N N N R
(Note that spaces above are just added to make the list easier to read)
(1) What is the precision of the system on the top-20?
(2) What is the F1 on the top-20?
(3) What is/are the uninterpolated precision(s) of the system at 25% recall?
(4) What is the interpolated precision at 33% recall?
(5) Assume that these 20 documents are the complete result set of the system. What is the MAP for the query?
Assume, now, instead, that the system returned the entire 10; 000 documents in a ranked list, and these are the first 20 results returned.
(6) What is the largest possible MAP that this system could have?
(7) What is the smallest possible MAP that this system could have?
(8) In a set of experiments, only the top-20 results are evaluated by hand. The result in (5) is used to approximate the range (6) to (7). For this example, how large (in absolute terms) can the error for the MAP be by calculating (5) instead of (6) and (7) for this query?
(1) Suppose we do not smooth the language model for d1 and d2. Compute the likelihood of the query for both d1 and d2, i.e., p(Qjd1) and p(Qjd2) (Do not compute the log-likelihood. You should use the scientific notation (e.g., 0:0061 should be 6:1 × 10-3) Which document would be ranked higher?
(2) Suppose we now smooth the language model for d1 and d2 using the Jelinek-Mercer smoothing method with λ = 0:8 (i.e., p(wjd) = λ·pmle(wjMd)+(1-λ)·pmle(wjMc)). Recompute the likelihood of the query for both d1 and d2, i.e., p(Qjd1) and p(Qjd2) (Do not compute the log-likelihood. You should use the scientific notation) Which document would be ranked higher?
• include your name and student ID in the file, and
• the file can be opened correctly on CSE machines.
You need to show the key steps to get the full mark.
Note: Collaboration is allowed. However, each person must independently write up his/her own solution.
You can then submit the file by give cs6714 ass1 ass1.pdf. The file size is limited to 5MB.
Late Penalty: -10% per day for the first two days, and -20% per day for the following days.
2019-11-28