Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

332:515 FINAL TERM PAPER Spring 2023

The final term paper (worth 30% of the course grade) will be due on the last day of the final exams. The topic of the term paper may be selected by the students so that the paper is beneficial to your study and your future research.  Each graduate student must prepare an individual report. However, the students may work on the same topic for their term papersThe undergraduate students will work in teams, 3-4 students per team, and submit one report per team.

1) The final term paper will be based on a journal paper (or papers) related to the material covered in class starting. The list of potential topics is presented at the end of this document.

2) The term paper may be also  on any chapter from the Sutton and Barto textbook, Part II, Chapters 8-13, the chapters that we did not cover in the course (this might be appropriate for the computer science students).

3) The paper may be based on an overview RL paper (or papers) that were tacitly mentioned in the lectures. These  papers  are  posted on Canvas, and  listed at the  end of this document  (this  might  be appropriate for the computer science students or ECE undergraduate students).

4) The term paper may be based on any topic of your interest on RL material related to the topics covered in this class, assuming it is discussed and approved by Professor Gajic.

Special Office Hours for Final Project Discussion/Selection will be held on Monday 3-5pm, Dec. 4, 2023, Prof. Gajic’s Office EE 222 or eventually in EE 240 Conference Room. You may also discuss the selection of the final term paper during the regular office hours on Tuesday Dec. 5, 2023. Of course, you may also discuss your Exam 2 during these office hours.

You may contact Prof. Gajic via email at anytime, and if needed arrange a Webex meeting (see the course syllabus for my Webex homeroom and my email address).

In general, the final term paper should be based on any reinforcement learning topic relevant to this course and/or approved by Professor Gajic, unless it is in the list of topics given in the follow-up. The topics are supposed to be beyond the material covered in the course. The paper should be typed and its pdf file uploaded on Canvas by the last day of the final exams, Dec. 23, 2023.

The paper must contain all parts of a standard conference/journal paper:

-     Abstract;

-     Introduction including the relationship to the material covered in this course;

-     Methods and/or algorithms developed;

-     Discussion of analytical results obtained;

-     Discussion of numerical results obtained (if any);

-     Conclusions;

-     References.

Potential Topics for the Final Term Paper

1)   Applications of Nash Differential Games to Aerospace. Following the theory of policy iterations for Nash differential games consider the linear-quadratic (LQ) Nash differential game problem for attitude takeover control of failed space craft, as presented in the paper:

Y. Chai, J. Luo, N. Han, and J. Xie, “Linear differential game approach for attitude takeover control of failed spacecraft,” Acta Astronautica, Vol. 175, 142-154, 2020.

Provide a detailed review of the paper with emphases on the policy iterations for theN-agent LQ Nash differential game problem.

2)   Policy Iterations in Affine Nonlinear Nash Differential Games. Using the knowledge that we got in this course about approximate dynamic programming for affine nonlinear systems, present in detail policy iterations for affine nonlinear Nash games by mostly following the paper:

K. Vamvoudakis and F. Lewis, “Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton-Jacobi equations,” Automatica, Vol. 47, 1556- 1569, 2011.

3)   Nash Games with Unknown Dynamics. The interesting papers on the topic of solving the Nash games on-line when the system model is not available are:

D.  Vrabie  and   F.  Lewis,  “Integral  Reinforcement  Learning  for   Finding  Online  the   Feedback  Nash Equilibrium  of  Nonzero-Sum  Differential  Games,”  313-330,  Chapter  17  in  Advances  in  Reinforcement Learning, A. Mellouk (ed.), IntechChina, 2012.

K. Vamvoudakis, “Non-zero sum  Nash Q-learning for  unknown deterministic continuous-time  linear systems,” Automatica, Vol. 61, 274-281, 2015.

4)   Online  RL  for Affine Nonlinear Systems. In Lecture 20 we presented RL for partially and completely model free systems (learning on line, learning from data interacting with the system) using the linear-quadratic optimal control problem formulation. An extension to non-linear (affine) systems is presented in Vrabie and Lewis (2012) in the book chapter that can be used as a term paper topic.

D.  Vrabie  and  F.  Lewis,  “Online  Adaptive  Optimal  Control  Based  on  Reinforcement  Learning,”  in Optimization and Optimal Control, A. Chinchuluunetal. (eds.),Springer, 2010.

J. Murray, C. Cox, G. Lendaris, and R. Saeks,” Adaptive dynamic programming,” IEEE Transactions on Systems, Man, and Cybernetics- Part C: Applications  and Reviews, Vol. 32, 140- 153, 2002.

5)   RL for Graphical Games. Several papers by Lewis and his coworkers considered RL for graphical games  (interactions among the agents are constrained to a fixed strongly connected graph with the agent dynamics represented by independent linear systems):

F. Lewis, H. Zhang, Hengster-Movric, and A. Das, “Graphical Games: Distributed Multiplayer Games on Graphs,”  pages  181-217,  in  Cooperative  Control  of Multi-Agent Systems:  Optimal and Adaptive  Design Approaches, Springer, 2014.

M. Abouheaf, F. Lewis, K. Vamvoudakis, and S. Haesaert, and R. Babushka, “Multi-agent discrete time graphical games and reinforcement learning solutions,” Automatica, Vol. 50, 3038–3053, 2014;

M.I. Abouheaf, F.L. Lewis, M. S. Mahmoud, and D. G. Mikulski, “Discrete-time dynamic graphical games: model-free reinforcement learning solution,” Control Theory and Technology, Vol. 13, 55–69, 2015.

6)   Discrete-Time Zero-Sum  Games. In Lecture 22, we presented the continuous-time zero sum games. RL for discrete-time LQ games was considered in the paper:

A. Al-Tamimi, F. Lewis, and M. Abu-Khalaf, “Model-free Q-learning for linear discrete-time zero-sum games with applications to H-infinity control,”  Automatica,  Vol. 43, 473-481, 2007.

An aircraft example is presented in this paper.

7)   Discrete-Time Nash Games. In Lecture 23, we presented the continuous-time zero sum games. RL for discrete-time LQ Nash games was considered in the paper:

Z.  Zhang, J.  Xu,  and  M.  Fu,  “Q-learning  for  feedback  Nash  strategy  of  finite-horizon  nonzero-sum difference games,” IEEE Transactions on Cybernetics, in press, 2021.

8)   RL for Electric Cars (PEM Fuel Cells):  The following two recent papers present the use of RL for air-fuel sensors control of PEM (proton exchange membrane) fuel cells used for electric cars:

M.   Gheisarnejad,  J.   Boudjadar,   and   M.   Khooban,   “A   new   adaptive   type-II   fuzzy-based   deep reinforcement learning control: Fuel cell and air-feed sensors control,” IEEE Sensors Journal, Vol. 19, 9081- 9089, 2019.

J. Li and T. Yu, “A new adaptive controller based on distributed reinforcement learning for PEMFC air supply system, Energy Reports, 1267-1279, 2021.

9)   RL for Affine Nonlinear Zero-Sum Differential Games. We studied in Lectures 18 and 21 the LQ zero-sum games. The RL for nonlinear (affine) zero-sum games was presented in:

H. Zhang, Q. Wei, and D. Liu, “An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum games,” Automatica, Vol. 47, 207-214, 2011.

Y. Zhu, D. Zhao, and X. Li, “Iterative adaptive dynamic programming for solving  unknown nonlinear zero-sum game based on online data,” IEEE Transactions on Neural Networks and Learning Systems, Vol. 28, 714-725, 2017.

10) RL for Kalman Filtering. Among the first attempts to connect RL and the Kalman filter (linear dynamic stochastic estimator) with potential applications to neuroscience was the paper:

I. Szita and A. Lorincz, “Kalman filter control embedded into the reinforcement learning framework,” Neural Computation, Vol. 16, 491-499, 2004.

C. Tripp and  R. Shachter, “Approximate  Kalman filter Q-leaning for continuous-state-space  MDPs,” Cornell University Archives, 2013.

X. Gao, H. Luo, B. Ning, F. Zhao, L. Bao, Y. Gong,Y. Xiao, and J. jiang, “RL-AKF: An adaptive Kalman filter navigation  algorithm  based  on  reinforcement  learning  for  ground  vehicles,  Remote  Sensing,   12,  1704; doi:10.3390/rs12111704, 2020.

The paper by Szita and Lorincz (2004) has the CS approach to RL (temporal difference, cost- to-go) and presents a SARSA algorithm.

11) Comparison of the LQ Zero-Sum Game Algorithms. Compare analytically and/or numerically the  algorithms  from  Lectures  21a,  c),  and  d):  the modified  Anderson  et al. 2010  sequential algorithm derived by Vrabie and Lewis, simultaneous update algorithm of Wu and Luo , and Li and Gajic’s algorithm.

12) Reinforcement Learning for Markov Jump Linear Systems. A recent paper is a good starting point to learn about this topic:

S. He, M. Zhang, H. Fang, F. Liu, X. Luan, and Z. Ding, “Reinforcement learning and adaptive optimization of a class of Markov jump systems with completely unknown dynamic information,” Neural Computing and Applications, Vol.32, 14311-14320, 2020. [The paper was published within the subject: “Extreme Learning Machine and Deep learning Networks].

13) ADP  for  Weakly  Coupled Nonlinear  Systems.  The  following paper presents reinforcement learning for so called weakly coupled systems were extensively  researched by Professor Gajicand his graduate students and colleagues:

L. Carrillo, K. Vamvoudakis, and J. Hespanha, “Approximate optimal adaptive control of weakly coupled nonlinear  systems: A  neuro-inspired  approach,”    International Journal  on Adaptive  Control  and Signal Processing, Vol. 30, 1494-1522, 2016.

14) RL for Output Feedback Control Systems. This is an important topic for applications of RL for real physical engineering systems. Namely, in general, only in rare cases all state variables are available for feedback, and the system only provides only on  its output a certain combination of the  state  variables,  say   y(t) = Cx(t)  with the rank  of the  matrix  C  (the  number  of  linearly

independent rows in  C) much  smaller than the number  of the  state  space variables.  How to implement  reinforcement learning in this case is discussed in the next paper that can be a topic of a final term paper:

F. Lewis and K. Vamvoudakis, “Reinforcement learning for partially observable dynamic processes: Adaptive dynamic programming using measured output data,”  IEEE Transactions on Systems, man, and Cybernetics  Part B: Cybernetics, Vol. 41, 14-25, 2011.

15)  Gradient-Based Learning and Differential Games. The Berkeley researchers in a very recent paper  connected  continuous-time  differential  games  to  the  concept  of  the  gradient-based reinforcement learning. In  Section  5,  they  consider the problem within LQ Nash  games  and algorithm of Li and Gajic (1995). The paper is quite mathematical, but readable.

E. Mazumdar, L. Ratliff, and S. Sastry, “On gradient-based learning in continuous games,” SIAM Journal on Mathematics of Data Science, Vol. 2, 103-131, 2020.

16) RL for Pareto (Cooperative) Games. When controllers (agents) cooperate in order to improve their performance criteria over trajectories a dynamic system we have Pareto differential games. A recent paper is a good reference for RL and Pareto differential (dynamic) games:

V.   Lopez   and    F.   Lewis,   “Dynamic    multi-objective   control   for   continuous-time   systems    using reinforcement learning,” IEEE Transactions on Automatic Control, Vol. 64, 2869-2874, 2019.

17) RL for Stackelberg Games. We did not have time to cover Stackelberg differential games in class (differential games with conflict of interest and sequential decision making). You may learn about them from the corresponding class on the game theory that I taught many years ago (uploaded on Sakai). A good paper on reinforcement learning and the Stackelberg games is:

K. Vamvoudakis,  F.  Lewis,  and  W.  Dixon,  “Open-loop  learning  for  hierarchical  control  problems,” International Journal on Adaptive Control and Signal Processing, Vol. 33, 285-299, 2017.

SURVEY/OVERVIEW PAPERS

[1]    L. Kaelbling, M. Littman, and A. Moore, “Reinforcement learning: A survey,” Journal of Artificial Intelligence Research,” 237-285, 1996.

[2]    F-Y. Wang, H. Zhang, and D. Liu, “Adaptive dynamic programming: An introduction,” IEEE Computational Intelligence Magazine, 39-47, May 2009.

[3]    F. Lewis and D. Vrabie, “Reinforcement learning and feedback control,” IEEE Circuits and Systems Magazine, 32-50, Third Quarter, 2009.

[4]    D. Bertsekas, “Approximate policy iteration: a survey and some new results,” Journal of Control Theory and Applications, Vol. 9, 310-335, 2011.

[5]    F. Lewis, D. Vrabie, and K. Vamvoudakis, “Reinforcement learning and feedback control,” IEEE Control Systems Magazine, 76- 105, Dec. 2012.

[6]    Z-P.  Jiang  and  Y.  Jiang,  “Robust  adaptive  dynamic  programming  for  linear  and  nonlinear  systems:  An overview,” European Journal of Control, Vol. 19, 417-425, 2013.

[7]    K. Vamvoudakis, H. Modares, B. Kiumarsi, and F. Lewis, “Game theory-based control system algorithms with real-time reinforcement learning: How to solve multiplayer games on line,” IEEE Control Systems Magazine, 33-52, 2017.

[8]    B. Kiumarsi,K. Vamvoudakis, H. Modares, and F. Lewis, “Optimal and autonomous control using reinforcement learning: A survey,” IEEE Transactions on Neural Networks and Learning Systems, Vol. 29, 2042-2062, 2018.

All the papers cited in this document can be downloaded from the Rutgers University libraries website. Particularly, going  to  the  New  Brunswick  libraries,  the  students  may  search  some  standard  databases  like:  IEEE  XploreScienceDirect (book and journal Elsevier publisher database), Web of Science, and Wiley Online Library. Prof. Gajic will make efforts to upload all the papers in the document to the final term paper file.

SEVERAL ADDITIONAL TOPICS WILL BE PROVIDED FOR THE UNDERADUATE STUDENTS OVER THE WEEKEND