Syllabus - OpenAI Project

OpenAI Scholars 2018 Reinforcement Learning Syllabus

Topics: Dynamic Programming (Value iteration, Policy iteration, and Q-learning)

Sutton Chapter 3: Markov Decision Processes and Chapter 4: Dynamic Programming
Deep RL Bootcamp Core Lecture 1 Intro to MDPs and Exact Solution Methods — Pieter Abbeel Video | Slides
Deep RL Bootcamp Core Lecture 2 Sample-based Approximations and Fitted Learning — Rocky Duan Video | Slides
Deep RL Bootcamp Lab 1: Markov Decision Processes. You will implement value iteration, policy iteration, and tabular Q-learning and apply these algorithms to simple environments including tabular maze navigation (FrozenLake) and controlling a simple crawler robot.
CS294 Reinforcement learning introduction — Sergey Levine Video | Slides
CS294 Value functions introduction — Sergey Levine Video | Slides
Introduction to Reinforcement Learning — Joshua Achiam Slides

Topics: Use Blackjack to implement first-visit or every-visit MC prediction

Topics: TD (Temporal Difference), use Cartpole and Humanoid for Policy Gradients

Sutton Chapter 6: Temporal-Difference Learning
Deep RL Bootcamp Core Lecture 4a Policy Gradients and Actor Critic — Pieter Abbeel Video | Slides
Deep RL Bootcamp Core Lecture 4b Pong from Pixels — Andrej Karpathy Video | Slides
CS294 Policy gradients introduction — Sergey Levine Video | Slides | Policy Gradients Project
Policy Gradient Algorithms — Lilian Weng Blog
Sutton Chapter 13.5: Actor-Critic Methods
CS294 Actor-critic introduction — Sergey Levine Video | Slides

Sutton Chapter 16.5: DQN
Deep RL Bootcamp Core Lecture 3 DQN + Variants — Vlad Mnih Video | Slides
Deep RL Bootcamp Lab 3: Deep Q-Learning. You will implement the DQN algorithm and apply it to Atari games.
CS294 Neural networks review (Achiam) Video | Slides
CS294 Advanced Q-learning algorithms — Sergey Levine Video | Slides | DQN Project

Deep RL Bootcamp Core Lecture 9 Model-based RL — Chelsea Finn Video | Slides
CS294 Learning dynamical systems from data — Sergey Levine Video | Slides
CS294 Learning policies by imitating optimal controllers — Sergey Levine Video | Slides
CS294 Advanced model learning and images — Chelsea Finn Video | Slides
CS294 Connection between inference and control — Sergey Levine Video | Slides
CS294 Model Based RL Project

Topics: Advanced Policy Gradients: Natural Policy, PPO (Use Roboschool instead of Mujoco license)

Deep RL Bootcamp Core Lecture 5 Natural Policy Gradients, TRPO, and PPO — John Schulman Video | Slides
Deep RL Bootcamp Lab 4: Policy Optimization Algorithms. You will implement various policy optimization algorithms, including policy gradient, natural policy gradient, trust-region policy optimization (TRPO), and asynchronous advantage actor-critic (A3C). You will apply these algorithms to classic control tasks, Atari games, and roboschool locomotion environments.
CS294 Learning policies by imitating optimal controllers — Sergey Levine Video | Slides

Topics: GAIL