Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

School of Natural and Computing Sciences

Department of Computing Science

MSc in Artificial Intelligence

2023 – 2024

Assessment 2 – LLM-based Data-to-Text with zero-shot and one-shot prompting and evaluating the factual accuracy of the output text

Title: CS551H – Natural Language Generation

Note: This assessment accounts for 25% of your total mark for the course.

Learning Outcomes

On successful completion of this component, a student will have demonstrated competence in the following aspects:

• Be able to perform data-to-text using zero-shot and one-shot prompting on LLMs.

• Be able to evaluate the factual accuracy of LLM output.

Information for Plagiarism: Your report and test cases may be submitted for plagiarism check (e.g., Turnitin). Please refer to the slides available at MyAberdeen for more information about avoiding plagiarism before you start working on the assessment. Please also read the following information provided by the university: https://www.abdn.ac.uk/sls/online-resources/avoiding-plagiarism/

Assessment Tasks

In this assessment, you will work with the LLM available in Microsoft Bing Copilot to carry out data-to-text on the basketball game data you are introduced to in the week 1 lab class. The basketball data has 20 data-to-text pairs index numbered from 00 to 19 – each pair is made up of (1) a JSON object with the box score data and (2) its corresponding reference game description.

Task 1 – Data-to-Text with Zero-shot Prompting: Using the best practice guidance for prompt engineering generate and save 20 game descriptions similar to the reference descriptions by applying a zero-shot prompt to Bing Copilot for each of the 20 JSON objects. In your submission, include a single file, all-games-zero-shot.pdf (similar to the all-games.pdf file from week 1 lab class) to record the outputs from this task. [5 marks]

Task 2 – Data-to-Text with One-shot Prompting: Using the best practice guidance for prompt engineering generate and save 20 game descriptions by applying a one-shot prompt based on a reference description to Bing Copilot for each of the 20 JSON objects. Create a single file, all-games-one-shot.pdf (similar to the all-games.pdf file from week 1 lab class) to record the outputs from this task. [5 marks]

Task 3 – Evaluating the factual accuracy of the outputs: You will randomly select 5 outputs generated in task1 and an additional 5 outputs generated in task2 to perform error annotation using the methodology from Thomson et al 2023 (the pdf of this paper is available with this assessment) to annotate the 5+5 outputs. In your submission, you MUST present your error annotations in a single file, error-annotations.pdf where you MUST first list the error annotations for the five outputs selected from task1 and then list the error annotations for the five outputs selected from task2. Each error annotation MUST be formatted following the example shown in Figure 2 in Thomson et al 2023, including the explanations. Your submission document MUST use the index numbers (00 to 19) of the selected outputs from task1 and task2 so that while marking their associated JSON objects could be used for reference. [15 marks]

Marking Criteria

CGS D: For Task 1, the zero-shot prompt produces output text similar in limited ways to the reference descriptions. For Task 2, the one-shot prompt produces output text similar in limited ways to the reference descriptions. For Task 3, the error annotations are not exhaustive, largely incorrectly categorized and/or poorly explained.

CGS C: For Task 1, the zero-shot prompt produces output text similar in most ways to the reference descriptions. For Task 2, the one-shot prompt produces output text similar in most ways to the reference descriptions. For Task 3, the error annotations are partially exhaustive, partially correctly categorized and/or partially explained.

CGS B: For Task 1, the zero-shot prompt produces output text as similar to the reference descriptions as possible. For Task 2, the one-shot prompt produces output text as similar to the reference descriptions as possible. For Task 3, the error annotations are nearly exhaustive, nearly correctly categorized and/or reasonably explained.

CGS A: For Task 1, the zero-shot prompt produces output text as similar to the reference descriptions as possible. For Task 2, the one-shot prompt produces output text as similar to the reference descriptions as possible. For Task 3, the error annotations are exhaustive, correctly categorized and/or well explained.

Submission Instructions

This assessment is due at 11:00 on 26/02/2024.

Submission should include a single zipped folder, firstName-lastName.zip with the three files from the three tasks. No other document is expected in the submission.

Any questions pertaining to any aspects of this assessment, please address them to Dr Yaji Sripada ([email protected]).