Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

EESM5720 Final Project Description

The goal of the final project is to give you some experience applying one or more of the              techniques we have/will learn about to real world data. Your primary objective is to demonstrate that you have achieved a solid understanding of how design choices when applying a pattern      recognition technique may affect classifier performance and how choices must be made with      reference to the characteristics of the pattern classification task and the data to be classified.

In order to do this, you should

1. Define a pattern classification task you wish to implement.

2. Identify the data to be used to train/test your pattern classifier.

3. Choose a pattern recognition technique we have studied in class from among the following: Bayesian classifier, linear classifier, neural network, support vector machine, boosting

4. Look at the choices (e.g. parameter values, activation nonlinearity, kernel function) that must be made in applying this technique. Make a hypothesis about which choice will affect the final classification performance the most and perform a study on how changing this choice actually   affects the performance.

Note that this project is not intended to be an extensive programming project. You are free to use software such as the libraries available in Python or MATLAB or any other software for       machine learning available over the internet in your project. However, you must describe exactly what you did and what you downloaded from the internet. If you claim to have written any code, you must provide this code with your submission.

For data, you must use a database of data you find on line. Examples of sites with large numbers of databases available online are

Kaggle:https://www.kaggle.com/

UCI Machine Learning Repository:http://archive.ics.uci.edu/ml/

Many others are available by searching Google. For example, there are often machine learning competitions that publish data for training and testing.

Small databases with less than 500 data points, such as commonly used “Iris” database are not very easy to use for this project.  The problem is that many pattern recognition             techniques quickly result in nearly perfect performance making it difficult to do                  comparative experiments. Thus, you should not choose a database with less than at least    500 samples. This is the main reason we do not recommend collecting data yourself.

You are free to work alone, or in a group of two. However, groups of two should have more        complex projects and the responsibilities of each person should be clearly identified. For              example, if you are comparing two different techniques, one person could work on implementing each technique. Or, if not much preprocessing has been done on the data, one person could work on pre-processing, and the other on classification. If you are doing a joint project, you should      explain how the work will be split among group members. Also, each person should hand in a     final report. The introductory part (e.g. describing the task) can be the same, but the main results

and conclusions should be different, reflective of the different work done by the two team members.

Proposal

You should submit a short (~1 paragraph) proposal via CANVAS. See CANVAS for the     deadline. This proposal is not graded, but is rather an opportunity for you to fix your plan. It should describe

1. What task have you defined (i.e. what are the inputs and outputs).

2. What database you are using.

3. What method(s) you plan to use.

4. What design choice you will be studying the effect of.

Evaluation

You will be evaluated based on two deliverables:  See CANVAS for deadlines.

1. An oral presentation. Each student is allowed 3 minutes to present. A two-person project has 6 minutes. Since you will be using a technique we have discussed in class, do not bother                 explaining it. Assume that everyone is fully familiar with every technique studied in the class.     Get directly to the point.

2. A written report, which should be handed in through the turnitin website (www.turnitin.com). The reports should not contain any code. Any code you have written, should be submitted          separately to CANVAS. Typically, reports are 6- 10 pages long single column including figures and tables reporting results.

Key questions you must address in the presentation/report are:

1. What did you do?  How did you separate the data into training, validation and testing data sets. How did you use each set?  You should describe your procedure with enough detail that one of    you fellow students could repeat the same experiments you did just from reading your report.

2. What differences in performance do you observe?  Are these expected or unexpected?  Can   you conjecture about the reasons for the differences in performance, and perhaps run additional experiments to evaluate your conjectures.

3. What tradeoffs do your results give insight into (e.g computational complexity versus classification accuracy, training time versus overfitting)?

4. To what extent have you taken into account the specific characteristics of the task or data you are using?  What are the possible choices you could have made, and why are the ones you have

made particularly well suited for the data you have chosen.

Each of these questions is weighted equally in the evaluation.

Evaluation Rubric

 

F

The project is non-existent, does not address in any way the objectives outlined above, or contains material that has been copied from others without proper acknowledgement.       Note that students found copying material will also be subject to the universitys policy   on academic integrity:http://www.ust.hk/vpaao/integrity/

 

 

C

The project defines the task and techniques investigated. Results presented give some       evidence that the stated technique has been applied to the stated task, and has produced    results consistent with what one would expect to see. The report clearly states what work was done by the author, and what external resources (e.g. code, computer programs) were used in producing the results.

 

 

B

The project results include an investigation of the effect of altering some aspect of the      technique. Some explanation of the expected effect of the alternation is given. The            expected and produced results are compared. The available data has been divided into      training, validation and testing data sets and the report clearly indicates that the student(s) understand the use of these different data sets.

 

 

 

A

The project results include a thorough and complete analysis of altering some aspect of    the task or technique. The report demonstrates a solid understanding of the effect of this   alteration. Results presented are comparative, and clearly illustrate the advantages,            disadvantages, and tradeoffs which others can use to make well informed decisions about solving a similar task using similar techniques. The final choices are linked to the specific characteristics of the task or data set. The available data has been divided into training,     validation and testing data sets and the report clearly indicates that the student(s)               understand the use of these different data sets.