闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

CISC7107 Data Mining and Decision Support Systems

Assignment 3.0

Due Date: Tuesday 16 May 2023

Time-series Forecasting

Visual Mining

Text Mining

Objectives:

Students are to gain experience in using several prediction software programs in doing time-series forecasting, visual mining and text mining. Learn how to analyze real-life time-series, visual mining and text mining using different techniques, and to interpret the meanings of the results.

Tasks to do:

1. Time-series Forecasting

Warm up task (no need to submit)

Find one or multiple time-series datasets of your choice. Multiple time-series are needed for multiple regression and/or correlation analysis

Based on your chosen time-series dataset, your task is to find the most reliable forecasting model for doing your prediction. You may forecast up to n steps ahead (where n can be anything meaningful).

Run forecasting by using at least two software tools, such as Weka and Crystal

Ball. (Alternatively, if Crystal Ball is not working on your computer, try others like Miner3D or RapidMiner or Orange).

Try forecasting with different popular algorithms of your choice provided by the

software programs.

Record down the ‘fitting errors’ of your model, such as MAE, MSE, RMSE etc. Produce forecasting charts based on each software tool you tried

Tabulate the forecasting performances, and conclude your findings (which are

most accuracy in terms of lowest errors.)

Copy-and-paste your forecasting graphs into your report, together with the Table

of forecasting performances

Analyse and draw your own conclusion, especially to discuss the differences in

the results obtained by different forecasting methods.

Samples of Table of forecasting performances (just examples only!) What will happen to Chengdu after 365 units of time?

Data source: 1900-2015, m>=3, Common Events

Algorithm

Mag. Error (MAD)

Double Moving Average Linear Regression Multilayer Preceptron SMO Regression Random Forest

4.952156667

6.7158

14.3258

5.6106

5.15

0.4023

1.5373

0.4091

0.4215

What will happen to Chengdu after 183 units of time?

Data source: 1900-2015, m>=5, Rare Events

Algorithm

Mag. Error (MAD)

Double Moving Average 5.70 0.259237162

Linear Regression fill in yourself fill in yourself

Multilayer Preceptron fill in yourself fill in yourself

SMO Regression fill in yourself fill in yourself

Random Forest fill in yourself fill in yourself

Task (need to submit)

Find time-series dataset of your interests. You are encouraged to try something ‘significant’, e.g. predicting world economy bubbles, major events, pandemics, important disasters, etc. Repeat the process similar to the above warm-up task.

In addition to forecasting the future values, try to find the association rules and/or correlations, if any.

2. Visual Mining

Visual Mining is about using computer tools to visualize out your data, in order to reveal special patterns, interesting observations, so you can make sense out of the massive data.

Data can be either structured or unstructured, but it needs to be in some reasonably large volume.

There is no limit in what tools you use, what algorithms, what methods, as long as you can extract interesting visual patterns out of the data.

This task can be combined with data stream mining (in which you can visualize the input data streams and the output results).

In your report, document about the source of the data, brief description of the data, what you are looking for, how you find them, and what do you think the results are, explanations and limitation (if any). Both graphics and written results should be documented properly in the report, although there is no specific required writing format.

3. Text Mining

Text mining is about the task of extracting relevant information from natural language text and to search for interesting relationships between the extracted entities. Text classification is one of the basic techniques in the area of text mining. It is one of the more difficult data-mining problems, since it deals with very high-dimensional data sets with arbitrary patterns of sparse data.

First, familiarize yourself with the two examples which are demonstrated in class: Classifying Different Language Texts, Classifying Moods from Online News, and Classifying news of different topics. Open the sample files where were demonstrated in class in Weka, in the preprocessing tab apply the Filter called StringToWordVector onto the data. You will notice how the STRING values are converted into a set of attributes that represent the frequency of each word in the strings.

The following boxes show an example of string conversion implemented by this filter.

Similar to the dataset, mood.arff, create your own dataset with a minimum of 100 records from other online text sources such as News website, Facebook comments, Blogs, Twitters, etc. Classify the records into some meaningful groups, e.g. emotions, good or bad, female or male, young or old, news categories (local, world, finance, sports, entertainment, etc.) You are free to choose any text sources and free to choose any meaningful groups. But for this training dataset, classify them according to your own judgment. Name the dataset with a new name of your choice. After applying the conversion filter StringToWordVector on the dataset, run it under J48 classification algorithm (or others of your choice), try to optimize the parameters, and try with and without text transformation, and record down the performance results of each run on your report. Discuss what you observe.

Submission:

Submit your experiment report in Excel and all the materials (both datasets in ARFF or any other format + performance results, charts, report and any other file in Excel) as a single zipped file to UMMOODLE by the due date.

Additional Options:

The tasks listed above are for the fundamental requirements for passing this assignment. If you will want to score a very high mark, consider doing the following tasks that are more challenging and requires more time.

Challenge 1: Try applying some “dimensionality reduction” techniques or “attribute transformation techniques” which are available in Weka. The aim is to improve the accuracy of the text classification model by reducing the attributes, removing bad records or both.

Challenge 2: For your time-series forecasting datasets, how do “data stream mining algorithms” perform in comparison to the traditional data mining algorithms for good accuracy?

Challenge 3: Are there any relation or correlation which you observe between the time- series forecasting and text mining? How do you use text mining to perhaps improve or to ascertain the forecasts from time series forecasting?

Challenge 4: Quite often in data mining, a single algorithm or technique may not give you complete or good quality results. Is there any possibility that you can combine two or three of the methods (TS-forecasting, Visual mining and Text mining – as sentiment analysis, e.g. headline news) together, so to generate some better results than using just any individual alone?

2023-05-18

Java

物理(Physical)

LINUX

C++

Python

Processing

sas

ios

maths

maple

C语言