Description
1 Introduction
The goal of this assignment is to do experiment with deep Q network (DQN), which combines the advantage of Q-learning and the neural network. In classical Q-learning methods, the action value function Q is intractable with the increase of state space and action space. DQN introduces the success of deep learning and has achieved a super-human level of play in atari games. Your goal is to implement the DQN algorithm and its improved algorithm and play with them in some classical RL control scenarios.
2 Deep Q-learning
Algorithm 1: deep Q-learning with experience replay. Initialize replay memory D to capacity N
Initialize action-value function Q with random weights h
Initialize target action-value function Q^ with weights h25h
For episode 5 1, M do
Initialize sequence s1~f gx1 and preprocessed sequence w1~wรฐ รs1
For t5 1,T do
With probability e select a random action at otherwise select at~argmaxaQรฐwรฐ รst ,a;hร
Execute action at in emulator and observe reward rt and image xt11
Set stz1~st,at,xtz1 and preprocess wtz1~wรฐstz1ร
Store transition wt,at,rt,wtz1 in D
Sample random minibatch of transitions wj,aj,rj,wjz1 from D
rj if episode terminates at step jz1 Setyj~rjzc maxa0 Q^ wjz1,a0;h{ otherwise
2
Perform a gradient descent step on yj{Q wj,aj;h with respect to the network parameters h
Every C steps reset Q^~Q
End For
End For
Figure 1: Deep Q-learning with experience replay
You can refer to the original paper for the details of the DQN. โHuman-level control through deep reinforcement learning.โ Nature 518.7540 (2015): 529.
3 Experiment Description
โข Programming language: python3
โข You should compare the performance of DQN and one kind of improved DQN and test them in a classical RL control environmentโMountainCar.
OPENAI gym provides this environment, which is implemented with python (https://gym.openai.com/envs/MountainCar-v0/). Whatโs more, gym also provides other more complex environment like atari games and mujoco.
Since the state is abstracted into carโs position, convolutional layer is not necessary in our experiment. You can get started with OPENAI gym refer to this link (https://gym.openai.com/docs/). Note that it is suggested to implement your neural network on the Tensorflow or Pytorch.
4 Report and Submission
โข Your report and source code should be compressed and named after โstudentID+name+assignment4โ.
Reviews
There are no reviews yet.