闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

A Primer for Deep Learning

2020

1. Preliminary Information

The testing computer environment is Macbook Pro (OSX 10.14) with an Intel Iris Plus GPU 655 (1536 MB). Such a hardware setup shall be well below most of your own computers. Please Google/Baidu and find out alternative solutions in case your own system is Windows/Linux.

I would stick to Mac OSX system, which is very similar to Linux. Windows users please try to solve issues by Googling. However, I do encourage you to start learning Linux from now on, and a good starting point is to use Docker/virtual machine.

About control theory, please take this course to learn what it is.

About Python, a good starting point can be found at:

https://github.com/Mallcock1/2016-01-11_Sheﬀield_Notes

2. Case I: Pole-Cart Problem

2.1. Installation

(i) Install Python 2, Python 3, and pip.

(ii) Anaconda may be a more easy way.

(iii) Activate a virtual environment by ‘python3 -m venv plaidml-venv’, and ‘source plaidml-venv/bin/activate’; or by ‘conda create py26’.

(iv) In the virtual environment, install TensorFlow (1.14.0, by: python -m pip install –upgrade https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-1.14.0-py3- none-any.whl, numpy shall be installed separately) and Keras (2.2.4). Hint: Do NOT install the most updated versions because the code packages would be incompatible†

(v) Install plaidml-keras to enable cross-platform hardware (read “plaidml/install.rst ... pdf”).

(vi) Setup plaidml by typing ‘plaidml-setup’. Hint: Choose your own GPU card option. (vii) install OpenAI Gym (https://github.com/openai/gym)

2.2. Test

Open Python3, go through the code, ‘Pole-cart_Xun.py ’ (about the Gym env ’CartPole-v0’).

2.3. Try the RL agent

Install keras-rl (read: Using Keras Reinforcement Learning API with OPENAI GYM).

Open Python3, go through the code, ‘Pole-cart_Xun2.py ’ (about the Gym env ’CartPole-v1’).

2.4. Report

Email me a report by next Wednesday and answer the following questions:

(i) Make the model work.

(ii) What does DQN mean?

(iii) Why the best reward is 500?

(iv) What does ‘SARSAAgent’ mean?

(v) modify the network size and compare the different performance. (vi) (Optional): Explain the agent and the policy (EpsGreedyQPolicy).

(vii) (Optional and advanced topic): Explain the whole coding structure inside the agent, such as what is forward, backward, and step and the associated connections? I will announce the best report and invite him/her to present the work next Friday.

3. Follow up study of the CartPole case

3.1. Introduction

In the previous section, you have learned how to start deep learning environment, and what is its possible applications in practical problems. But all the code are highly encapsulated, which means the implementation details are hidden in classes, that would prevent further understanding of the specific concepts. Hence, here I suggest some further readings and tests.

3.2. Steps

The understanding can be deepened by following the steps below.

(i) I always use Keras because of its simplicity and cross-platform .

(ii) Go through and understand every line of the code on:

https://keras.io/examples/rl/actor_critic_cartpole/.

(iii) A much more complicated code can be found from keras-rl: https://github.com/keras- rl/keras-rl. It consists of many routines and functions and examples.

(iv) First install it and then focus on the dqn example: dqn_cartpole.py.

(v) Debug this code (by ‘import pdb’and ‘pdb.set_trace() ’and step through the whole code), and write down the calling connections between classes and functions.

(vi) Start to modify the dqn code, dqn.py, which is generic and cumbersome, to finally understand the basic concept.

(vii) Google every question you have about any unknown python grammar and Keras issues.

3.3. Test

Here I suggest a new question to testify whether you really understand and know how to solve your own problem:

Q: Modify the cart-pole case. The current one only control the attitude of the pole (to hold at θ = 0deg). Now the new task is to control both the attitude of the pole and the position of the cart, i.e., θ = 0 and x = 0.

Explanation: Physically, during the launch of a rocket, it’s attitude shall be normal to the ground and its x position is fixed too. In dqn, the new question suggests that a neural network model shall output two Q values for both θ and x.