Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

COMP5423 NATURAL LANGUAGE PROCESSING

Group Project Instruction (Draft1)

Topic: Machine Reading Comprehension

Group Size: 4-5 Students per Group

Due Date: April 16, 2023 (Sunday, 23:59)

Objectives:

Over past decades, there has been a growing interest in making the machine understand human  languages.  And  recently,  great  progress  has  been  made  in  machine  reading comprehension (MRC). From a certain perspective, the recent tasks entitled MRC can also be seen as the extended tasks of question answering (QA). The fast development of the MRC field is driven by various large and realistic datasets released in recent years. Each dataset is usually composed of documents and questions for testing the document understanding ability. The answers for the raised questions can be obtained through seeking from the given documents and even from the external knowledge bases.

Extractive QA vs Abstractive QA vs Multiple-Choice QA:

According to the formats of answers, MRC datasets can be classified into three types, namely datasets with extractive answers, with abstractive answers, and with multiple-choice answers. In this project, we focus on the first one. Generally, this kind of questions can extensively  examine  one’s  reasoning  skills,  including  simple pattern  recognition,  clausal inference and multiple sentence reasoning, of a given passage.

Conventional Features vs Vector Representations vs Deep Neural Networks:

According  to  implementation  approaches,  QA  systems  can  also  be  classified  into conventional feature based, vector representation based, and deep neural network based. (1) The simplest way is to extract indicative features from both documents and questions based on syntax, frame semantics, co-reference. These conventional features are then combined into a classifier to produce the final decision. (2) Another way is to select the option answers based on vector representations, e.g., pre-trained word embeddings. (3) With the development of deep neural networks, the attention-based architecture becomes a widely adopted learning paradigm. This paradigm allows an end-to-end way to learn representations and then select the answers based on the learned representations and attention scores.

SQuAD:

The famous SQuAD dataset is used to evaluate the performance ofyour projects. Stanford Question Answering Dataset  (SQuAD) is  a reading comprehension dataset, consisting of questions posed by crowd-workers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable. SQuAD 1.1 is the previous version of the SQuAD dataset, contains 100,000+ question-answer pairs on 500+ articles. SQuAD 2.0 combines the 100,000 questions in  SQuAD  1.1 with over 50,000 unanswerable questions written adversarially by crowd- workers to look similar to answerable ones. To do well on SQuAD 2.0, systems must not only answer questions, when possible, but also determine when no answer is supported by the paragraph and abstain from answering.

Since SQuAD 2.0 contains the data from SQuAD 1. 1, we only use the SQuAD 2.0 for evaluation. It is worth mentioning that users need to upload their results to evaluate online. For the dev set, users can upload their results to the official website and then the performance scores are calculated automatically. For the test set, users need to email the organizers. To avoid extra burden for the organizers, you only need to evaluate your models on the dev set and obtain the corresponding scores automatically calculated via the official website.

 

Figure 1: An example of evaluation scores automatically calculated via the official website.

You can find more details and download the dataset at

https://rajpurkar.github.io/SQuAD-explorer/.

Evaluation tutorial:

https://worksheets.codalab.org/worksheets/0x8212d84ca41c4150b555a075b19ccc05/

Project Requirements:

You are required to develop a machine reading comprehension system to extract an answer span to a given question.

Input: a document, and a question (query)

Output: an answer (a text span in the document)

Basic Requirements

1.   Your  system  supports basic word matching techniques using at least conventional features.

2.   Your system selects an answer based on well-encapsulated feature extractors, named entity taggers, and/or classifiers provided by existing tools (e.g., nltk, gensim, etc.).

Advanced Requirements

1.   Your system extracts an answer based on vector representations, e.g., pretrained word embeddings.

2.   Your system extracts an answer using deep neural network-based approaches, e.g., attention mechanisms, etc.

Superior Requirements

1.   Your system applies large-scale pre-trained language models, like BERT.

2.   Your system has the ability of identifying unanswerable questions in the dataset. Additional Exploration

There are some super large language models, like gpt 3.5 series, and they usually provide playgrounds for customers to use. You are encouraged to learn, use, and analyze (like exploring different prompts and instructions) these language models for question answering. Please note that too many queries may require charges, and they are unnecessary for your project. Just exploring several cases for analysis is sufficient.

References

Required Datasets:

Know What You Don't Know: Unanswerable Questions for SQuAD

SQuAD: 100,000+ Questions for Machine Comprehension of Text

Existing Approaches (Codes and Pre-trained Models):

Linguistic Features for Document-based QA

Using Pre-trained Word Embeddings

Using BERT-based Embeddings for Document-based QA

Baidu Machine Reading Comprehension Platform PALM

Related Work (Reference Papers):

Machine Reading Comprehension: a Literature Review

A Survey on Neural Machine Reading Comprehension

A Survey on Machine Reading Comprehension Systems

Machine comprehension with syntax, frames, and semantics

Free GPU Resources:

Baidu PaddlePaddle:Course RegistrationTutorialUsage

Google Colab:TutorialUsage

Kaggle Kernel:TutorialUsage

FolydHub:TutorialUsage

What to Hand In:

1. The Source Code of Your Program.

•   The reported performance can be achieved using the code step by step.

•   A readme file describes: the structure of your program, the steps to run the code to achieve  the  reported  performance  and  the  dependence  environment  (the  used packages and their versions).

•   Add  annotations  in  the  front  of each  file  to  specify  its  usage  and  necessary comments.

•   Put all the files into a directory named code” .

•   TAs will run the code. If there are some problems, TAs will contact your group to demonstrate online. And if the  submitted code cannot  achieve the reported performance finally, there will be some deductions of your grade.

2. Written Report  (At least  6 pages with  12-Point  font  size  and  single  line  spacing, excluding the cover page. Please submit the pdf file)

•   A cover page with project topic and group member’ names and IDs;

•   The screenshot (like Figure 1) of the scores automatically calculated via the official website;    When    submitting    your    results,    please    use    a    specific    text “COMP5423_Group_Id” (like COMP5423_Group_01”) for the description field.

 

•   The role and the contribution proportion of each group member. For example, student A 30%, student B 25%, student C 25%, student D 20%. The scores of group members may vary depending on the contributions.

•   Introduction   of  the   topic,   i.e.,   machine   reading   comprehension   (e.g.,   the significance and practical value of the topic, etc.);

•   Functions of the (expected) system (including those you have implemented and those you can only design);

•    Flowchart diagram of the system;

•    Approaches and tools used to implement the functions;

•   Results analysis. (The reported accuracy must be achieved using the  submitted code.)

•   Anything else you would like to share with us (like the additional exploration).

3. Demo Video

Less than ten minutes

Try to show the highlights of your project solution. Like main difficulties and primary achievements of your project.

Pack all the submissions in one zipped file and submit to Blackboard. As described above, the file contains the “code” directory, the report pdf, and the video file.

Remark: Please do remember to click the Submit” button after you upload the file.

Grading Scheme:

We will grade the project according to the following schemes:

1.   System Implementation: 40%

2.   System Performance: 30%

3.   Written Report : 20%

4.   Demo Video: 10%

Note that we will assess your system implementation based on the written report and the code. For each member in a group, we will assign different scores according to the contribution proportion.

Contact Information:

•   Yongqi Li ([email protected])