Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Department of Econometrics and Business Statistics

ETW2800 TEXT ANALYTICS FOR BUSINESS

Semester 2, 2023

Individual Project: Exploring and Categorising Movies

Due date: Monday, 18 September, 11.55 pm (Week 9)

This assignment is 30% of your assessment for this unit. The total number of marks for this assignment is 100.

The objectives of this assignment:

1.   Exploring the data set using predefined concept rules and custom concept rules in the Concepts node.

2.   Exploring the terms and their relationship in the Text Parsing node.

3.   Perform text categorisation and evaluate its performance.

ASSIGNMENT TASK

Assignment Aim:

Use SAS Visual Text Analytics to explore and categorise consumer complaints in the banking and finance industry.

Introduction to CFPB_COMPLAINTS_CLEAN data set:

With permission, SAS obtained the CFPB_COMPLAINTS_CLEAN data set from the Consumer Financial  Protection  Bureau  (CFPB).  The  data  are  augmented  and  cleaned for  education purposes. This data has two columns and 10,000 rows. This data exists in the "Available" tab in the Choose Data window.

The      table      below       briefly       describes      the      provided       variables      within      the

CFPB_COMPLAINTS_CLEAN data set.

Variable          Type         Length Description

dispute

binary

8

Whether the consumer disputed the company's response

complaint

varchar

2123

Consumer submitted a description of "what

happened" from the complaint.

Task:

1. Set the role of the variables

Set the role of the dispute variable as category and the complaint variable as text. Display both variables in the results.

2. Explore the contents of the complaint variable:

Use the  LITI  rules  in  the  Concept  node  to  explore  the  contents  in  the  complaint variables. Then,  identify the  content that you  are  interested  in  the  most.  Form  a problem  statement  and  objective  based  on  the  content  you  have  identified.  The process of exploring the idea should be excluded from your report because it is a preliminary process for forming the problem statement. Only the problem statement and objective that have been identified are included in the report.

3. Form  a  Concept  node to  extract the documents that are  related to the problem statement.

Use predefined concepts and custom concepts with the LITI Concept Rules to form the concepts  that  can  extract  the  information,  match  the  documents  related  to  the problem statement and help achieve the objective. So these concepts can be used to enhance the categorisation in the Categories node.

4. Identify the association among the terms that are related to the problem statement. Use the term map in the Text Parsing node to explain the relationship among the terms that are related to the problem statement.

5. Build and assess the Categorisation model.

Explain the result of the categorisation model based on the dispute variable. Enhance the  categories  rule  using  the  custom  concepts.  Then,  reevaluate  the  model's performance.

Instructions:

The total word count for this report shouldn't be more than 2000 words (excluding footnotes, diagrams, tables, references and appendix). The goal of this report is to present the insights in a concise, organised, and meaningful way to facilitate understanding and decision-making.

Your report should include the following key sections:

1. Executive  Summary: A  brief  overview  of  the  entire  report,  highlighting  the   key findings,  insights,  and  recommendations.  This  section   is  designed  to   give  busy stakeholders a quick understanding of the report's main points.

2. Introduction: This section sets the stage by explaining the purpose and scope of the analysis. The main components are the problem statement and the objective of the analysis. It provides context for the reader and outlines the objectives of the report.

3. Data Description: Information about the dataset used for analysis. This could include details about the data dimension, type of texts (social media posts, customer reviews, etc.), and any specific characteristics of the data that might be relevant to the analysis.

4. Methodology: Detailed information about the methods and techniques used for text analysis. This includes the preprocessing steps (such as text cleaning, tokenisation, and stop-word  removal),  and  the  algorithms  or  tools  employed  for  analysis  (e.g.,  LITI concept rules, text categorisation, etc.).

5. Results and Findings: The core of the report, where the outcomes of the text analysis are  presented.  This  section  might  include  insights  from  keyword  and  concept extraction, categorisation or any other  relevant techniques. Visualisations  such  as word clouds or term maps can help illustrate the findings.

6. Discussion and Interpretation: An in-depth explanation of the results, discussing the implications of the findings and  connecting them to the  research  objectives. This section might also address any limitations or challenges faced during the analysis.

7. Recommendations: Based  on  the  insights  gained  from  the  analysis,  this  section provides  actionable   recommendations.  These  could   be   suggestions  for   business strategies, improvements, or areas that require further investigation.

8. Conclusion: A   concise  summary  of  the  report's  main  points,  emphasising  the significance of the findings and reiterating the recommendations.

9. References: Citations  to  any  external  sources,  tools,  or  frameworks  used  in  the analysis.

Grading:

Your report will be evaluated based on the following criteria:

1.   Executive Summary: 10 marks

2.   Introduction: 10 marks

3.   Data Description: 5 marks

4.   Methodology: 10 marks

5.   Results and Findings: 20 marks

6.   Discussion and Interpretation: 20 marks

7.   Recommendations: 10 marks

8.   Conclusion: 10 marks

9.   References: 5 marks

Total Points: 100

Submission Guidelines:

1.   Make sure that you  regularly  make backup copies of your work. Computer, disk, or cloud problems will not be accepted as valid reasons for late submissions or requests for extensions.

2.   Any  students  caught  plagiarising  or  permitting  others to  plagiarise their  work will receive a zero mark on this assignment. Students should be aware ofplagiarism and collusionand the procedure should one be suspected of committing such acts.

3.   Proper  citation  is  an  essential  aspect  of  academic  writing.  As  you  prepare  your assignments,   please   cite   all  sources   used   in   your  work,   including   direct   quotes, paraphrases, and ideas  borrowed from other authors or generative AI.  Failure to cite sources  properly  could  result  in  charges  of  plagiarism,  which  could  negatively  affect assignment marks. Use APA citation style. You can refer to this webpage for the APA citation style and generate the APA citation using thisAPA referencing generator. You can refer toMonash's pageandthis webpagefor citing resources obtained from generative AI  and nonrecoverable sources.

4.   Students  should  emphasise  the  narration  and  how  the  results  are  presented  and interpreted. Students should endeavour to ensure that the report is complete and well-composed. Poor presentation, poor command of English writing and/or failure to comply with instructions may result in a mark penalty. You are encouraged to access Studiosityto improve your report writing. The Studiosity link is on Moodle's Assessment page.

*Please  note that the services available for you  in Studiosity (accessible via Moodle site) are supplementary to this unit. Studiosity is a third-party provider contracted by Monash University to assist you with generic skills such as essay writing, grammar, referencing etc. They do  not provide specific comments on  unit  content or the appropriateness of your answer  regarding assessment  tasks  and  learning  outcomes.  Rather,  they  address  your  key  skills  of  argument, structure, expression, and referencing.

Evaluation of your work for assessment purposes is conducted solely by your Monash teachers (chief examiners or tutors). You should use consultation hours provided by Monash teachers if you have concerns or questions about unit content and your understanding of that content or if you have questions specifically about assessment tasks.

a.   Your report should not exceed 2000 words (excluding footnotes,   references and  appendix).  Use  default  format,  paragraph  and  margin  settings.  (These settings are in default mode whenever you open a new Word document.)

b.   Font type: Times New Roman. Font size: 12.

c.    1.2 lines spacing between lines.

d. All diagrams should be inline with the text for ease of reading and not placed in an Appendix at the end of the report.

5.   All      submissions     will      be     via      Moodle.      Save     your      report      as name>_ETW2800_Individual_Project in PDF format. Fill out  with your full name.

Additional Remarks:

The week  6, 7 and 8 tutorials will  have the  project  clinic  session. The tutor  will  provide guidance and Q&A sessions for students to clear their doubts and get their project in the right direction during these project clinic sessions. These sessions are not for you to get the exact answers or solutions. However, the assistance in these sessions will definitely put you in a better position for this project.