Description
In problem 1, I used a random choice function to do policy evaluation.
In problem 2, I used a policy evaluation function such that it has noise probability to go to the intended direction.
In problem 3, just need to choose the direction that returns maximum rewards. I spent 10 hours in total.
Reviews
There are no reviews yet.