Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

COMP390

2022/23

Complex data analysis of consumption behavior of urban catering users

project description

With the continuous development of the Internet and information technology, the Internet catering model represented by online group purchases and takeaways has risen rapidly and generated a large amount of multi-dimensional and heterogeneous data. This type of data reflects the catering consumption behavior in different cities from various aspects. feature. The main purpose of this project is to conduct visual analysis of online group purchase data, and use machine learning algorithms, keyword extraction, topic analysis, community division and other technologies to mine group purchase and takeaway website catering data; from two aspects: group-oriented and individual-oriented Analyzing urban catering consumption behavior, designing and implementing a visual analysis system for urban catering consumption behavior, using visualization technology to solve the analysis difficulties caused by traditional data analysis technology due to the large volume and heterogeneity of data; The goal of analyzing consumer behavior.

Purpose and Objectives

The main purpose of this project is to establish an analysis network. According to the diversity of food and beverage consumption data types on takeaway and group buying websites, the multi-view visual analysis method is used to analyze consumers' geographical choices, emotional tendencies and hot topics in urban food and beverage consumption behaviors. Aiming at the characteristics of large scale and complex dimensional information of catering consumption behavior data, an attempt is made to design a data filtering method based on the spatial and temporal dimensions, and to design and implement a visual analysis system for urban catering consumption behavior. And through the analysis of catering consumption data in a certain city on a group buying website, the system can effectively identify the characteristics of urban catering consumption behavior under user interaction analysis.

Primary Literature and Background Reading

At present, POI data has been widely used in urban computing and other fields. Yuan et al. [1] used point of interest data and population flow data to identify functional areas of cities, providing a strong basis for urban planning. Karamshuk et al. [2] provided suggestions for the location selection of urban businesses by analyzing a large number of user check-in data. Fu et al. [3] proposed a ranking method called ClusRanking, which uses point of interest data, road structure and population flow to rank house prices. References [4-6] combined point of interest data, road network data and social media check-in data to synergistically analyze the pollution index of noise in different regions at different time granularities. Existing research on group-buying website POI data mainly obtains sentiment information from user comment data and visualizes it. Due to differences in the cultural background of consumers, the ambiguity of human language vocabulary and structure, and the lack of clear boundaries between positive and negative opinions, it is difficult for computer systems to determine the meaning of exact words.

Wu et al. [7] proposed a method to solve the above problems. Uncertainty model of emotion and design a visualization tool based on scatterplot and radial to analyze user's emotion. The research of Alper et al. [8] provides a more fine-grained graphical summary than the literature [7], which can gradually reveal the details of the text from various reviews, aiming to allow users to make decisions as quickly and efficiently as possible. Get useful information by converting large amounts of review text into visual interactions, avoiding the need for users to browse through all the reviews. In order to reduce the workload of users to view comments, Chen et al. [9] used the LDA model and sentiment analysis to judge the sentiment polarity of the evaluation information on the comment data at the sentence level.

Nodes represent different emotional words, use colors to represent different emotional polarities, and allow users to filter emotional words according to different topics. This visualization method is simple and easy to implement, but the disadvantage is that when the number of emotional words is too large, nodes will appear overlapping situation. Hennig et al. [10] proposed a cluster heat map to analyze the change process of sentiment in the time interval.

At the same time, in order to quickly identify the patterns between related topics, a ranking algorithm based on dimensionality reduction was used on the cluster heat map. The axis shows many related topics, the x-axis represents different time intervals, and different colors are used for each group of topic combinations and time intervals to indicate their emotions, but the system currently does not allow users to define their own time intervals. Wang et al. [11] designed a visualization method named SentiCompass for sentiment analysis of Twitter time-varying data based on the influence of space-time tunneling on ring patterns. This method solves the problem that Hennig et al. [10] does not allow users to set the time The problem of spacing is more effective in visualizing emotional changes from a spatial perspective. Wang et al. [12] used tag cloud to display popular topics on Weibo, and combined geographic information and sentiment to draw the geographic distribution of sentiment for a topic.

Although most researches have already produced relevant results, due to the large volume and diverse data types of group buying website data, it is difficult for traditional data analysis techniques to effectively analyze data, while the processing objects of visual analysis techniques can be Arbitrary data types, arbitrary data characteristics, and combinations of heterogeneous data. Therefore, how to apply visual analysis technology to the analysis of catering data on group buying websites and explore urban catering consumption behavior is a novel and worthwhile research topic. From the perspective of analyzing urban catering consumption behavior, this project will design and implement a visual analysis system for urban catering consumption behavior by using multi-view and multi-granularity visual analysis method based on group buying website data, and prove the effectiveness of the system.

Development and Implementation Summary

This project will introduce the visual analysis of urban catering consumption behavior from the following aspects: firstly, data collection, preprocessing and feature extraction of urban catering consumption behavior will be carried out; secondly, data mining will be carried out; Design a visual analysis scheme based on multi-view and multi-granularity; it is expected to adopt a visual analysis system for urban catering consumption behavior based on the Django framework. The data analysis work mainly includes data selection, acquisition, preprocessing, and extraction of catering consumption behavior characteristics for groups and individual cities. This project compares the catering consumption data attributes of major group buying websites, and selects the platform with the most balanced and comprehensive data as the platform. The basic data of urban catering consumption behavior analysis overcomes the shortcomings of traditional consumption behavior analysis survey forms and methods, such as large workload, too many human interference factors, difficult to control the investigation process and poor data authenticity. Based on the selected platform data, the consumer satisfaction, topic information, keywords and similar consumer groups are mined. The consumer satisfaction analysis is mainly expected to use the machine learning algorithm SVM to classify the sentiment of the reviews, and then use the consumer satisfaction calculation method to extract the consumer satisfaction information; the topic analysis is expected to use the LDA topic model to extract the consumption topics of various types of shop consumers; The consumer's comment information extracts the keywords of their consumption;

In the visual analysis scheme design, this project will design a sentiment analysis bubble chart based on the traditional bubble chart through space expansion and color mapping. Identify negative emotions; design a location-enhanced visual analysis scheme based on maps and bubble charts, which can effectively discover consumers' consumption location preferences, activity trajectories, and analyze the types of shops they consume in their regions; Based on the platform, it explores the regional consumption characteristics, consumption keywords, consumption themes and other characteristics of group catering consumption, as well as the taste preferences, time series characteristics, consumption satisfaction and other characteristics of individual catering consumption.

Data Sources

In terms of data acquisition, the project will try to integrate the public data of major food websites to expand the sample of data analysis and make the results of food and beverage analysis more reliable. In the group-oriented visual analysis of consumer behavior, the behavior of consumers will be analyzed based on more fine-grained location information to discover the unique laws of consumer behavior in different regions. In the analysis of individual consumption behavior, a step of mining work is carried out on the relationship between consumer groups to find out whether there is a relationship between different groups or whether there is a relationship between consumers in the same group. Finally, a consumption profile is constructed for each consumer, and food recommendations are made according to their consumption habits.

All data will be analyzed mainly using public domain data, which is completely open and transparent. Any sensitive data is subject to data virtualization to ensure security.

Test and Evaluation

For testing and evaluation, it is expected that two methods will be adopted. One is to provide the model with previous data for comparison, so as to judge the difference between the results generated by the model and the real results. differences in psychological expectations.

Ethical Considerations

Regarding ethical considerations, I have read the Code of Ethics and undertake that the project will fully comply with them. The project data will mainly be analyzed using public domain data, which are completely open and transparent. For the qualitative and quantitative analysis of some questionnaires that may be conducted, the information of the respondents will be completely confidential and the data will be obfuscated to ensure information security.

BCS Program Standards

Some data processing methods used in this project come from degree courses, inspiration and some methods come from databases, data analysis, etc., including establishing databases, performing functional operations on sample sampling, etc.

This project attempts to use new dimensions to extract, analyze and visualize data. For details, see the literature and development section above.

This project synthesizes the current difficulties faced by the Internet catering industry in the market, conducts data analysis on them, and discusses solutions. In the future, the market will play a reference role in the reform and promotion of Internet catering.

The project is in the process of completion, for data collection. Analysis, sorting and the application of some new programming languages are all major tests, which require the ability to self-manage important work. During the process of the project, it is necessary to continuously improve and upgrade, and to evaluate and reflect on oneself.

UI/UX Mockup

This project does not involve specific web page or program development. It only provides application and visualization of data analysis. It will use scatter plots, heat maps, bubble charts, map distribution and other modes to display data visualization.

Project Plan

mission name

duration

Starting time

Complete time

Literature reading

5 days

2022/10/4

2022/10/10

Determine the topic

4 days

2022/10/11

2022/10/14

Detailed Proposal

17 days

2022/10/16

2022/11/4

Project Description

3 days

2022/10/16

2022/10/18

Aims & Objectives

3 days

2022/10/18

2022/10/20

Key Literature & Background Reading

4 days

2022/10/20

2022/10/25

Development & Implementation Summary

4 days

2022/10/26

2022/10/29

Organize and modify

6 days

2022/10/29

2022/11/4

first draft of the project

137 days

2022/11/6

2023/5/12

data collection

19 days

2022/11/6

2022/11/30

data analysis

16 days

2022/12/5

2022/12/26

programming analysis

21 days

2023/1/2

2023/1/30

Project finishing

43 days

2023/2/1

2023/3/31

First draft completed and video preparation

16 days

2023/4/1

2023/4/21

final thesis

1 days

2023/5/12

2023/5/12

Risk and Contingency Planning

Risk

Program

Possibility

Influences

Loss of data backups

Save and backup each piece of data and upload it to a network hard drive to ensure that data will not be lost if the hardware is damaged.

Low

Loss of backup will greatly slow down the research process and even cause research interruption.

Difficulty in obtaining data or data errors

Conduct relevant data extraction research and screening in advance, although it may take a lot of time. And maintain real-time communication with the project mentor.

High

Slow down the project process, or even cause the project to fail.

system crash commit failed

Make file backups in advance and communicate with instructors.

Middle

Failed to submit on time, resulting in project failure.

References:

1 YUAN J,ZHENG Y,XIE X. Discovering regions of dif-ferent functions in a city using human mobility and POIs[C]/ / ACM SIGKDD International Conference on Knowl- edge Discovery and Data Mining. ACM, 2012:186 - 194.

2 KARAMSHUK D,NOULAS A,SCELLATO S,et al. Geo-spotting: mining online location-based services for optimal retail store placement[C]. IKDD,Chicago,IL,SUA,2013.

3 FU Y,XIONG H,GE Y,et al. Exploiting geographic dependencies for real estate appraisal: a mutual perspec - tive of ranking and clustering[C]/ / ACM SIGKDD Inter-national Conference on Knowledge Discovery and Data Mining. ACM,2014: 1047 - 1056.

4 ZHENG Y,LIU T,WANG Y,et al. Diagnosing NewYork city's noises with ubiquitous data[C]/ / ACM Inter-national Joint Conference on Pervasive and Ubiquitous Computing. ACM,2014: 715 - 725.

5 WANG Y,ZHENG Y,LIU T. A noise map of New York city[C]/ / Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication. ACM,2014: 275 - 278.

6 LIU T,ZHENG Y,LIU L,et al. Methods for Sensing Urban Noises[J]. Choose,2014.

7 WU Y,WEI F,LIU S,et al. Opinion Seer: Interactive Visualization of Hotel Customer Feedback [J]. IEEE Transactions on Visualization & Computer Graphics,2010,16( 6) : 1109 - 1118.

8 ALPER B,YANG H,HABER E,et al. Opinion Blocks: Visualizing Consumer Reviews [C]/ / IEEE Vis Week Workshop on Interactive Text Analytics for Decision Mak-

ing,2011.

9 CHEN Y S,CHEN L H,TAKAMA Y. Proposal of LDA- based Sentiment Visualization of Hotel Reviews[C] / / Data Mining Workshop ( ICDMW ) ,IEEE International

Conference on Data Mining Workshop,2016: 687 - 693.

10 HENNIG P,BERGER P,BREHMY M,et al. Hot Spot Detection — an Interactive Cluster Heat Map for Senti- ment Analysis[C] / / Data Science and Advanced Analyt-ics ( DSAA) ,2015. 36678.

11 WANG F Y,SALLABERRY A,KLEIN K,et al. Senti- Compass: Interactive visualization for exploring and com-paring the sentiments of time - varying twitter data[C] / / IEEE Pacific Visualization Symposium ( Pacific Vis ) ,2015: 129 - 133.

12 WANG Z,YU Z,CHEN L,et al. Sentiment Detection and Visualization of Chinese Micro-blog[C]/ / Data Sci- ence and Advanced Analytics ( DSAA ) , International

Conference on Data Science and Advanced Analytics,IEEE,2015: 251 - 257.