ETW2800 TEXT ANALYTICS FOR BUSINESS Semester 2, 2023
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Department of Econometrics and Business Statistics
ETW2800 TEXT ANALYTICS FOR BUSINESS
Semester 2, 2023
Individual Project: Exploring and Categorising Movies
Due date: Monday, 18 September, 11.55 pm (Week 9)
This assignment is 30% of your assessment for this unit. The total number of marks for this assignment is 100.
The objectives of this assignment:
1. Exploring the data set using predefined concept rules and custom concept rules in the Concepts node.
2. Exploring the terms and their relationship in the Text Parsing node.
3. Perform text categorisation and evaluate its performance.
ASSIGNMENT TASK
Assignment Aim:
Use SAS Visual Text Analytics to explore and categorise consumer complaints in the banking and finance industry.
Introduction to CFPB_COMPLAINTS_CLEAN data set:
With permission, SAS obtained the CFPB_COMPLAINTS_CLEAN data set from the Consumer Financial Protection Bureau (CFPB). The data are augmented and cleaned for education purposes. This data has two columns and 10,000 rows. This data exists in the "Available" tab in the Choose Data window.
The table below briefly describes the provided variables within the
CFPB_COMPLAINTS_CLEAN data set.
Variable Type Length Description |
|||
dispute |
binary |
8 |
Whether the consumer disputed the company's response |
complaint |
varchar |
2123 |
Consumer submitted a description of "what happened" from the complaint. |
Task:
1. Set the role of the variables
Set the role of the dispute variable as category and the complaint variable as text. Display both variables in the results.
2. Explore the contents of the complaint variable:
Use the LITI rules in the Concept node to explore the contents in the complaint variables. Then, identify the content that you are interested in the most. Form a problem statement and objective based on the content you have identified. The process of exploring the idea should be excluded from your report because it is a preliminary process for forming the problem statement. Only the problem statement and objective that have been identified are included in the report.
3. Form a Concept node to extract the documents that are related to the problem statement.
Use predefined concepts and custom concepts with the LITI Concept Rules to form the concepts that can extract the information, match the documents related to the problem statement and help achieve the objective. So these concepts can be used to enhance the categorisation in the Categories node.
4. Identify the association among the terms that are related to the problem statement. Use the term map in the Text Parsing node to explain the relationship among the terms that are related to the problem statement.
5. Build and assess the Categorisation model.
Explain the result of the categorisation model based on the dispute variable. Enhance the categories rule using the custom concepts. Then, reevaluate the model's performance.
Instructions:
The total word count for this report shouldn't be more than 2000 words (excluding footnotes, diagrams, tables, references and appendix). The goal of this report is to present the insights in a concise, organised, and meaningful way to facilitate understanding and decision-making.
Your report should include the following key sections:
1. Executive Summary: A brief overview of the entire report, highlighting the key findings, insights, and recommendations. This section is designed to give busy stakeholders a quick understanding of the report's main points.
2. Introduction: This section sets the stage by explaining the purpose and scope of the analysis. The main components are the problem statement and the objective of the analysis. It provides context for the reader and outlines the objectives of the report.
3. Data Description: Information about the dataset used for analysis. This could include details about the data dimension, type of texts (social media posts, customer reviews, etc.), and any specific characteristics of the data that might be relevant to the analysis.
4. Methodology: Detailed information about the methods and techniques used for text analysis. This includes the preprocessing steps (such as text cleaning, tokenisation, and stop-word removal), and the algorithms or tools employed for analysis (e.g., LITI concept rules, text categorisation, etc.).
5. Results and Findings: The core of the report, where the outcomes of the text analysis are presented. This section might include insights from keyword and concept extraction, categorisation or any other relevant techniques. Visualisations such as word clouds or term maps can help illustrate the findings.
6. Discussion and Interpretation: An in-depth explanation of the results, discussing the implications of the findings and connecting them to the research objectives. This section might also address any limitations or challenges faced during the analysis.
7. Recommendations: Based on the insights gained from the analysis, this section provides actionable recommendations. These could be suggestions for business strategies, improvements, or areas that require further investigation.
8. Conclusion: A concise summary of the report's main points, emphasising the significance of the findings and reiterating the recommendations.
9. References: Citations to any external sources, tools, or frameworks used in the analysis.
Grading:
Your report will be evaluated based on the following criteria:
1. Executive Summary: 10 marks
2. Introduction: 10 marks
3. Data Description: 5 marks
4. Methodology: 10 marks
5. Results and Findings: 20 marks
6. Discussion and Interpretation: 20 marks
7. Recommendations: 10 marks
8. Conclusion: 10 marks
9. References: 5 marks
Total Points: 100
Submission Guidelines:
1. Make sure that you regularly make backup copies of your work. Computer, disk, or cloud problems will not be accepted as valid reasons for late submissions or requests for extensions.
2. Any students caught plagiarising or permitting others to plagiarise their work will receive a zero mark on this assignment. Students should be aware ofplagiarism and collusionand the procedure should one be suspected of committing such acts.
3. Proper citation is an essential aspect of academic writing. As you prepare your assignments, please cite all sources used in your work, including direct quotes, paraphrases, and ideas borrowed from other authors or generative AI. Failure to cite sources properly could result in charges of plagiarism, which could negatively affect assignment marks. Use APA citation style. You can refer to this webpage for the APA citation style and generate the APA citation using thisAPA referencing generator. You can refer toMonash's pageandthis webpagefor citing resources obtained from generative AI and nonrecoverable sources.
4. Students should emphasise the narration and how the results are presented and interpreted. Students should endeavour to ensure that the report is complete and well-composed. Poor presentation, poor command of English writing and/or failure to comply with instructions may result in a mark penalty. You are encouraged to access Studiosityto improve your report writing. The Studiosity link is on Moodle's Assessment page.
*Please note that the services available for you in Studiosity (accessible via Moodle site) are supplementary to this unit. Studiosity is a third-party provider contracted by Monash University to assist you with generic skills such as essay writing, grammar, referencing etc. They do not provide specific comments on unit content or the appropriateness of your answer regarding assessment tasks and learning outcomes. Rather, they address your key skills of argument, structure, expression, and referencing.
Evaluation of your work for assessment purposes is conducted solely by your Monash teachers (chief examiners or tutors). You should use consultation hours provided by Monash teachers if you have concerns or questions about unit content and your understanding of that content or if you have questions specifically about assessment tasks.
a. Your report should not exceed 2000 words (excluding footnotes, references and appendix). Use default format, paragraph and margin settings. (These settings are in default mode whenever you open a new Word document.)
b. Font type: Times New Roman. Font size: 12.
c. 1.2 lines spacing between lines.
d. All diagrams should be inline with the text for ease of reading and not placed in an Appendix at the end of the report.
5. All submissions will be via Moodle. Save your report as
Additional Remarks:
The week 6, 7 and 8 tutorials will have the project clinic session. The tutor will provide guidance and Q&A sessions for students to clear their doubts and get their project in the right direction during these project clinic sessions. These sessions are not for you to get the exact answers or solutions. However, the assistance in these sessions will definitely put you in a better position for this project.
2023-08-25
Exploring and Categorising Movies