COMP301101 Web Services and Web Data Semester Two 2018/2019
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
COMP301101
Web Services and Web Data
Semester Two 2018/2019
Question 1 (15 marks)
Which of the following statements are True and which are False (1 mark each)
a) The World Wide Web is a collection of structured data items connected to each other by well-defined relations.
b) Web robots are detected when they download the robots.txt file from a web site.
c) HTML documents are the best media type for creating a semantic web.
d) A website server can detect crawlers by their quick successive requests to the server.
e) A “parts of speech” tagger is trained using large collections of documents that have been manually labelled.
f) The crawler’s request queue represents the frontier of the crawling process.
g) A search engine must give text tagged as boldface higher score as a search keyword.
h) In a search engine, the lexical analyser creates a Document Object Model (DOM) of the page.
i) A URL is not the same as a URI.
j) The predicate in an RDF statement defines attributes of the object.
k) The tokeniser in a search engine must not convert the capital letters in a document to small letters.
l) An English search engine can be 60% more accurate if it uses stemming to reduce words to their stems.
m) If we add “skip pointers” to an inverted list, we can achieve a substantial improvement in the asymptotic performance of search operations.
n) There is no official standard for RESTful Web APIs.
o) A GET request is used when we do not intend to change the data on a server, and therefore a GET message should not have a payload.
Question 2 (5 marks)
Draw an RDF diagram to represent the concepts in the following statement: “The
United Kingdom comprises four countries: England (whose capital is London), Wales, Scotland, and Northern Ireland. The capitals of Wales and Scotland are Cardiff and Edinburgh, respectively. Belfast is the capital of Northern Ireland. London is also the capital of the United Kingdom and it has a population of 8 million people. English is the official language of England, while in Wales both English and Welsh are official languages.”
Question 3 (5 marks)
A web crawler has a peak capacity to download 600 web pages per second, but it has to respect a politeness window of 60 seconds. The crawler’s request queue contains 100,000 URLs. The URLs contain 30,000 different server addresses. Can this crawler achieve its peak capacity? Prove your answer with calculations.
Question 4 (5 marks)
How will each of the following words change if processed by ‘Step 1a’ of the Porter Stemmer?
a) Was
b) Grasps
c) Masses
d) Mucus
e) Tries
Question 5 (20 marks)
We want to develop a RESTful web service to allow students to rate the teaching of professors in various modules on a scale from 1 to 5. The service must provide all the required web API’s to allow client applications to provide the functionality described below.
The service maintains data about professors, modules, and the rating of professors by different users (students). Information about modules and professors are manually added by the admin of the service using the admin site. The admin site is automatically
created by the web development framework.
A module may be:
− Taught by different professors in different academic years.
− Taught by different professors in different semesters.
− Taught by more than one professor at the same time (for example each teaching some part of the module).
Since an academic year spans two calendar years (e.g. 2018-19), an academic year will be given by its first year only (hence 2018-19 is given as 2018).
Users of the service (students) can rate professors but cannot add or change module information. Before they can rate professors, users must register by providing a username, email, and password. Users can only rate professors when they are logged in to the service. The overall rating of a professor is the average of the professor’s rating by all users across all module instances taught by this professor. A module instance is a module taught in a certain year and semester by one or more professors. Any decimal fraction in the average is rounded to the nearest integer.
A client application connected to the service will provide the user with the following options:
Option 1. View a list of all module instances and the professor(s) teaching each of them. Here is an example of a possible client application output for this option:
Code Name Year Semester Taught by
CD1 Computing for Dummies 2017 1 JE1, Professor J. Excellent
VS1, Professor V. Smart
---------------------------------------------------------------------------------------------------------------- ----------
CD1 Computing for Dummies 2018 2 JE1, Professor J. Excellent
--------------------------------------------------------------------------------------------------------------------------
PG1 Programming for the Gifted 2017 2 TT1, Professor T. Terrible
Note that modules and professors are given unique identifiers in the web service to avoid any possible mix-up between names.
Option 2. View the rating of all professors. Here is an example of a possible client application output for this option.
The rating of Professor J. Excellent (JE1) is *****
The rating of Professor T. Terrible (TT1) is *
The rating of Professor V. Smart (VS1) is **
Option 3. View the average rating of a certain professor in a certain module:
The rating of Professor V. Smart (VS1) in module Computing for Dummies (CD1) is ***
Option 4. Rate the teaching of a certain professor in a certain module instance.
Note that all filtering of data and calculations should be done by the server, i.e. the client application does not process incoming data; the application simply displays the data returned by the service in a human readable format.
a) Propose a suitable database model for the above service. You must specify all required tables, the fields of each table and their data types, and the relationships between the tables. You should also indicate which fields must have unique values. (10 marks)
b) Determine all the web API’s that must be provided by the service to allow a client application to perform the functionality described in Options 1 – 4. For each API, you must specify: the purpose of the API, a URL, an HTTP method, the request data, and the response data. (7 marks)
c) Write pseudocode for a handler function (called a view function in Django) that processes the request to get the rating of a certain professor in a certain module. (3 marks)
2023-05-18