COMP5423 NATURAL LANGUAGE PROCESSING
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
COMP5423 NATURAL LANGUAGE PROCESSING
Group Project Instruction (Draft1)
Topic: Machine Reading Comprehension
Group Size: 4-5 Students per Group
Due Date: April 16, 2023 (Sunday, 23:59)
Objectives:
Over past decades, there has been a growing interest in making the machine understand human languages. And recently, great progress has been made in machine reading comprehension (MRC). From a certain perspective, the recent tasks entitled MRC can also be seen as the extended tasks of question answering (QA). The fast development of the MRC field is driven by various large and realistic datasets released in recent years. Each dataset is usually composed of documents and questions for testing the document understanding ability. The answers for the raised questions can be obtained through seeking from the given documents and even from the external knowledge bases.
Extractive QA vs Abstractive QA vs Multiple-Choice QA:
According to the formats of answers, MRC datasets can be classified into three types, namely datasets with extractive answers, with abstractive answers, and with multiple-choice answers. In this project, we focus on the first one. Generally, this kind of questions can extensively examine one’s reasoning skills, including simple pattern recognition, clausal inference and multiple sentence reasoning, of a given passage.
Conventional Features vs Vector Representations vs Deep Neural Networks:
According to implementation approaches, QA systems can also be classified into conventional feature based, vector representation based, and deep neural network based. (1) The simplest way is to extract indicative features from both documents and questions based on syntax, frame semantics, co-reference. These conventional features are then combined into a classifier to produce the final decision. (2) Another way is to select the option answers based on vector representations, e.g., pre-trained word embeddings. (3) With the development of deep neural networks, the attention-based architecture becomes a widely adopted learning paradigm. This paradigm allows an end-to-end way to learn representations and then select the answers based on the learned representations and attention scores.
SQuAD:
The famous SQuAD dataset is used to evaluate the performance ofyour projects. Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowd-workers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable. SQuAD 1.1 is the previous version of the SQuAD dataset, contains 100,000+ question-answer pairs on 500+ articles. SQuAD 2.0 combines the 100,000 questions in SQuAD 1.1 with over 50,000 unanswerable questions written adversarially by crowd- workers to look similar to answerable ones. To do well on SQuAD 2.0, systems must not only answer questions, when possible, but also determine when no answer is supported by the paragraph and abstain from answering.
Since SQuAD 2.0 contains the data from SQuAD 1. 1, we only use the SQuAD 2.0 for evaluation. It is worth mentioning that users need to upload their results to evaluate online. For the dev set, users can upload their results to the official website and then the performance scores are calculated automatically. For the test set, users need to email the organizers. To avoid extra burden for the organizers, you only need to evaluate your models on the dev set and obtain the corresponding scores automatically calculated via the official website.
Figure 1: An example of evaluation scores automatically calculated via the official website.
You can find more details and download the dataset at
https://rajpurkar.github.io/SQuAD-explorer/.
Evaluation tutorial:
https://worksheets.codalab.org/worksheets/0x8212d84ca41c4150b555a075b19ccc05/
Project Requirements:
You are required to develop a machine reading comprehension system to extract an answer span to a given question.
Input: a document, and a question (query)
Output: an answer (a text span in the document)
Basic Requirements
1. Your system supports basic word matching techniques using at least conventional features.
2. Your system selects an answer based on well-encapsulated feature extractors, named entity taggers, and/or classifiers provided by existing tools (e.g., nltk, gensim, etc.).
Advanced Requirements
1. Your system extracts an answer based on vector representations, e.g., pretrained word embeddings.
2. Your system extracts an answer using deep neural network-based approaches, e.g., attention mechanisms, etc.
Superior Requirements
1. Your system applies large-scale pre-trained language models, like BERT.
2. Your system has the ability of identifying unanswerable questions in the dataset. Additional Exploration
There are some super large language models, like gpt 3.5 series, and they usually provide playgrounds for customers to use. You are encouraged to learn, use, and analyze (like exploring different prompts and instructions) these language models for question answering. Please note that too many queries may require charges, and they are unnecessary for your project. Just exploring several cases for analysis is sufficient.
References
Required Datasets:
Know What You Don't Know: Unanswerable Questions for SQuAD
SQuAD: 100,000+ Questions for Machine Comprehension of Text
Existing Approaches (Codes and Pre-trained Models):
Linguistic Features for Document-based QA
Using Pre-trained Word Embeddings
Using BERT-based Embeddings for Document-based QA
Baidu Machine Reading Comprehension Platform PALM
Related Work (Reference Papers):
Machine Reading Comprehension: a Literature Review
A Survey on Neural Machine Reading Comprehension
A Survey on Machine Reading Comprehension Systems
Machine comprehension with syntax, frames, and semantics
Free GPU Resources:
Baidu PaddlePaddle:Course RegistrationTutorialUsage
What to Hand In:
1. The Source Code of Your Program.
• The reported performance can be achieved using the code step by step.
• A readme file describes: the structure of your program, the steps to run the code to achieve the reported performance and the dependence environment (the used packages and their versions).
• Add annotations in the front of each file to specify its usage and necessary comments.
• Put all the files into a directory named “code” .
• TAs will run the code. If there are some problems, TAs will contact your group to demonstrate online. And if the submitted code cannot achieve the reported performance finally, there will be some deductions of your grade.
2. Written Report (At least 6 pages with 12-Point font size and single line spacing, excluding the cover page. Please submit the pdf file)
• A cover page with project topic and group member’ names and IDs;
• The screenshot (like Figure 1) of the scores automatically calculated via the official website; When submitting your results, please use a specific text “COMP5423_Group_Id” (like “COMP5423_Group_01”) for the description field.
• The role and the contribution proportion of each group member. For example, student A 30%, student B 25%, student C 25%, student D 20%. The scores of group members may vary depending on the contributions.
• Introduction of the topic, i.e., machine reading comprehension (e.g., the significance and practical value of the topic, etc.);
• Functions of the (expected) system (including those you have implemented and those you can only design);
• Flowchart diagram of the system;
• Approaches and tools used to implement the functions;
• Results analysis. (The reported accuracy must be achieved using the submitted code.)
• Anything else you would like to share with us (like the additional exploration).
3. Demo Video
Less than ten minutes
Try to show the highlights of your project solution. Like main difficulties and primary achievements of your project.
Pack all the submissions in one zipped file and submit to Blackboard. As described above, the file contains the “code” directory, the report pdf, and the video file.
Remark: Please do remember to click the “Submit” button after you upload the file.
Grading Scheme:
We will grade the project according to the following schemes:
1. System Implementation: 40%
2. System Performance: 30%
3. Written Report : 20%
4. Demo Video: 10%
Note that we will assess your system implementation based on the written report and the code. For each member in a group, we will assign different scores according to the contribution proportion.
Contact Information:
• Yongqi Li ([email protected])
2023-04-14