Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

ST455 projects

You are expected to work in a group of 2 (feel free to form your own teams) to conduct a project on reinforcement learning and write a report. Please provide percentage contributions to the project at the end of the report (e.g., member A 45%, member B 55%). These ratios are proportional to your project marks. If not provided, I assume both of you contribute equally to the project.

Project marking rubrics:

You can find it in the repo.

Project topics

Your project should demonstrate an understanding of concepts, methods and models in the area of Reinforcement Learning and how to use/apply them using software frameworks in Python. This should involve using knowledge acquired in the course and building on this knowledge through an independent study to go more deeply into a specific subject. Your project may cover different aspects of reinforcement learning methods, demonstrating an understanding of their suitability for a given class of problems, how to implement and validate them using synthetic and/or real datasets. The project can also comprise case studies involving specialised and/or complex scenarios in which you can experiment with different actions, states, rewards and policies. You may consider establishing some theoretical properties regarding RL methods or proposing your own methods. Development of the new theories and methodologies are not necessary, but you will get a high mark if these developments contain enough novelty.

Your report would typically be in the form of a Jupyter notebook, containing code, along with a Markdown text explaining different parts. You may want to provide a tutorial-like exposition for certain subjects covered in your project. For example, this may include instructions for loading specific libraries and/or packages you have used, preprocessing tasks you have applied to your datasets and specific requirements for running your models. Your report and any other project related material should be submitted in the GitHub classroom repo that will be assigned to your project.

It is expected from your report to be presented up to a high professional standard. This means that it has to be structured well, neat and polished. Your report should have a title, abstract, main content body, conclusion, and a list of references. In the abstract, please make sure to briefly describe what is the problem addressed by your project, why is the problem a problem, what is your solution and why you have chosen given solution. The abstract should be short, a paragraph of 5-10 sentences.

You may use visualizations in your report, for example using Matplotlib, Tensorboard and other Python libraries.

Your report may describe a working prototype application. In this case, your report should contain a clear and full description of the steps that one needs to follow in order to run your application.

Your report should cite any references that you use (including the reference for the code). You may also discuss and cite any previously proposed alternative solutions to your problem. The conclusion section should briefly summarise the results of your project, highlight your main findings, and briefly discuss any interesting avenues for future research.

You might want to discuss your project topic with Chengchun. The list of candidate projects is given below to give you some idea about potential project topics. You may (but you are not expected) to take one of the project topics listed below.

Candidate project topics

Here you'll find references to various resources such as research papers and blogs that may inspire your choice of the course project. You may also check references provided in the lecture materials.

You're welcome to propose a topic that is not included in the list below.

Causal Reinforcement Learning

· Li et al., Causal Reinforcement Learning: An Instrumental Variable Approach

· Shi et al., A Minimax Learning Approach to Off-Policy Evaluation in Confounded Partially Observable Markov Decision Processes

· Tennenholtz et al., Off-Policy Evaluation in Partially Observable Environments

· Xu et al., An Instrumental Variable Approach to Confounded Off-Policy Evaluation

· Zhang et al., Markov Decision Processes with Unobserved Confounders: A Causal Approach

Games

· Atari zoo

· Model based reinforcement learning for Atari

Optimization

· Zhu et al, Causal Discovery with Reinforcement Learning

· TraceIn A Simple Method to Estimate the Training Data Influence

· Microsoft MARO (Multi-Agent Resource Optimization)

· Kool et al, Attention, Learn to Solve Routing Problems!

· Mao et al, Resource management with deep reinforcement learning, Hotnets 2016

· Mirhoseini et al, Device placement optimization with reinforcement learning, ICML 2017

· Mirhoseini et al, A hierarhical model for device placement, ICLR 2018

· Bello et al, Neural combinatorial optimization with reinforcement learning, ICLR 2017

Medical application

· Gao et al., Offline Learning of Closed-Loop Deep Brain Stimulation Controllers for Parkinson Disease Treatment

· Komorowski et al., The Artificial Intelligence Clinician learns optimal treatment strategies for sepsis in intensive care

· Li et al., Testing Stationarity and Change Point Detection in Reinforcement Learning

· Luckett et al., Estimating Dynamic Treatment Regimes in Mobile Health Using V-Learning

· Shi et al., Does the Markov Decision Process Fit the Data: Testing for the Markov Property in Sequential Decision Making

· Zhou et al., Estimating Optimal Infinite Horizon Dynamic Treatment Regimes via pT-Learning

Ridesharing

· Xu et al., large scale fleet management a planning and learning approach

· Shi et al., Off-Policy Confidence Interval Estimation with Confounded Markov Decision Process

· Shi et al., Dynamic Causal Effects Evaluation in A/B Testing with a Reinforcement Learning Framework

· Tang et al., A Deep Value-network Based Approach for Multi-Driver Order Dispatching

· Wan et al., Pattern Transfer Learning for Reinforcement Learning in Order Dispatching

Protein structure prediction

· Senior et al, AlphaFold: Improved protein structure prediction using potentials from deep learning

Finance

· Deng et al, Deep direct reinforcement learning for financial signal representation and trading, IEEE Trans. on Neural Networks and Learning Systems, 2016

Some past project topics

· Creating a conversational ChatBot using deep Q-network

· Fairness or efficiency: strategy analysis for coronavirus medical treatment using RL

· Financial portfolio management using deep RL

· Deep direct recurrent reinforcement learning for algorithmic trading

· Reinforcement learning for trade execution with Alpha and risk aversion

· Solving ATT48 by deep reinforcement learning

· Stock trading by deep reinforcement learning