闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

HW10: Reinforcement Learning

1 Getting started

Download the starter code from canvas. It consists of two ﬁles: Q-Learning.py and tests.py. You can create and activate your virtual environment with the following commands:

python3 -m venv /path/to/new/virtual/environment

source /path/to/new/virtual/environment/bin/activate

Once you have sourced your environment, you can run the following commands to install the necessary dependencies:

pip install --upgrade pip

pip install torch==1 . 11 . 0 torchvision==0 . 12 . 0 torchaudio==0 . 11 . 0

pip install gym==0 .23 . 1 pygame==2 . 1 .2

You should now have a virtual environment which is fully compatible with the skeleton code. You should set up this virtual environment on an instructional machine to do your ﬁnal testing.

2 Q-Learning

For the Q-learning portion of HW10, we will be using the environment FrozenLake-v1 from OpenAI gym. This is a discrete environment where the agent can move in the cardinal directions, but is not guaranteed to move in the direction it chooses. The agent gets a reward of 1 when it reaches the tile marked G, and a reward of 0 in all other settings. You can read more about FrozenLake-v1 (it is the same as FrozenLake-v0) here: https://www.gymlibrary.dev/environments/toy_text/ frozen_lake/. You will not need to change any code outside of the area marked TODO, but you are free to change the hyper-parameters if you want to. For each sampled tuple (s, a, r, s\ , done), the update rule for Q-learning is:

Q(s, a) =

if !done

if done

The agent should act according to an epsilon-greedy policy as deﬁned in the Reinforcement Learning 1 slides. In this equation, α is the learning rate hyper-parameter, and V is the discount factor hyper- parameter.

• HINT: tests.py is worth looking at to gain an understanding of how to use the OpenAI gym env.

• Files to Submit: For this section, you should submit the ﬁles Q learning.py and Q TABLE.pkl.

3 OpenAI gym Environment

You will need to use several OpenAI gym functions in order to operate your gym environment for rein- forcement learning. As stated in a previous hint, tests.py has a lot of the function calls you need. Several important functions are as follows:

env.step(action)

Given that the environment is in state s, step takes an integer speciﬁying the chosen action, and returns a tuple of the form (s, r, done, info). ’done’ speciﬁes whether or not s\ is the ﬁnal state for that particular episode, and ’info’ is unused in this assignment.

env.reset()

Resets the environment to it’s initial state, and returns that state.

env.action space.sample()

Samples an integer corresponding to a random choice of action in the environment’s action space.

env.action space.n

In the setting of the environments we will be working with for these assignments, this is an integer corre- sponding to the number of possible actions in the environment’s action space.

You can read more about OpenAI gym here: https://www.gymlibrary.dev.

4 Submission Format

You can test your learned policies, by calling python3 ./tests.py. Make sure to test your saved Q-tables using tests.py on the instructional machines with a virtual environment set up as speciﬁed above. This is the same program, which we will be using to test your Q-tables and Q-network, so you will have a good idea about how many points you will receive for the automated tests portion of the grade. Your submission should contain the following ﬁles:

Required: Q learning.py, Q TABLE.pkl

Please submit these ﬁles in a zipped folder title <yournetid>.zip , where ’yournetid’ is your net ID. Please make sure that there is not a folder inside the zipped folder, and that the submitted ﬁles are at the top level of the zipped folder.

The assignment is due Dec 13 at 23:59 central time. We are not accepting late submissions for this assignment. Regrades will only be accepted if they are due to an error in tests.py.