Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Project - 2024

This document will describe the final project for our course together. You will have to work in team of 5 to 6 students, but everyone should understand and contribute to the project. Use the IDE of your choice and develop in Python. This final project will be worth 40%. The main objectives of the project are the following:

§ Apply everything we learnt about machine learning and data science;

§ Get experience in Python programming;

§ Develop intuition about the algorithms and the results.

For the final project, you can get up to 60 pts for a total of 40% + 20% bonus. However, the second part is challenging!

PART I – Using Scikit-Learn – 30 pts

The part one is mandatory to pass the project. You will use once again the RFID dataset. Here are the steps you need to do. You need to provide a code that enables me to repeat each step. For example, you can program a menu or a small graphical user interface for me to select the various options or you can provide me with different files to execute.

1. On the RFID dataset, you need to test features selection to improve the dataset. This includes testing features extraction with the PolynomialFeatures, with Linear/Quadratic Discriminant Analysis, and at least 1 type of PCA.

2. You need to use the Pipeline from Scikit-Learn in at least one place.

3. You need to test the RFID dataset with Perceptron, Logistic Regression, SVM, Decision Tree, Random Forest, Boosting, and Multilayer Perceptron.

4. Use GridSearch on one of the algorithms in 3 to try to find a good combination of hyperparameters and preprocessing to maximize the results.

PART II – Updating the multilayer perceptron – 20 pts

This part is NOT mandatory and will give you a bit more challenge. For this part, you will need to use the Multilayer Perceptron code I have shared with you. Take your time to understand it and to execute it in debug mode. You will not succeed if you do not carefully check the variable size in debug.

1. Add the option in the MLP to select ReLU activation rather than Sigmoïd. The activation should be customizable by layer.

2. Implement Partial_Fit(X, y, epoch), a function that will enable to start from a trained MLP and keep training it for an addition number of epochs (hint: it is similar to fit but saving the weight instead of initializing them).

3. Transform the MLP to support an arbitrary number of layers (warning, with more than 5 layers, it will take long to train and require a good computer).

4. Try your final MLP on the RFID dataset. Warning! This may take long to train, so you can try it with a small percentage of the dataset (20-40%).

PART III – Questions and report about the project – 10pts

The third part is also mandatory to pass the project. You need to fill a short report and answer a few questions about your project.

1. First page should include all the team members names/surnames and their student number.

2. Then, you should summarize the steps that you were able to accomplish regarding Part I and Part II. You should include images of the results you obtained.

3. Then, you need to answer the following questions:

a. Which algorithms from Scikit-Learn worked best on the RFID dataset and what accuracy did you get?

b. Does it work better or worse with feature selection? Which method was the best?

c. Were you able to train our own multilayer perceptron on the RFID dataset? Did it perform the same as the version in Scikit-Learn?

What you should give me and when

§ The complete source to execute your code, but without the dataset, I’ll use my own to confirm;

§ A final report of your work;

§ Answers to questions about what you did.

Use WeTransfer to send me everything. You have until March 31st to send your final project. Projects that are late will get 0%.

Good luck and happy learning!