OpenAI Scholars 2018 Reinforcement Learning Syllabus
Primary Resources
- Reinforcement Learning: An Introduction, Sutton and Barto
- Algorithms for Reinforcement Learning, Csaba Szepesvári
- Deep RL Bootcamp 2017
- CS294 Fall 2017 UC Berkeley
Week 1, Jun 4: Markov Decision Processes
Topics: Dynamic Programming (Value iteration, Policy iteration, and Q-learning)
- Sutton Chapter 3: Markov Decision Processes and Chapter 4: Dynamic Programming
- Deep RL Bootcamp Core Lecture 1 Intro to MDPs and Exact Solution Methods — Pieter Abbeel Video | Slides
- Deep RL Bootcamp Core Lecture 2 Sample-based Approximations and Fitted Learning — Rocky Duan Video | Slides
- Deep RL Bootcamp Lab 1: Markov Decision Processes. You will implement value iteration, policy iteration, and tabular Q-learning and apply these algorithms to simple environments including tabular maze navigation (FrozenLake) and controlling a simple crawler robot.
- CS294 Reinforcement learning introduction — Sergey Levine Video | Slides
- CS294 Value functions introduction — Sergey Levine Video | Slides
- Introduction to Reinforcement Learning — Joshua Achiam Slides
Week 2, Jun 11 Monte Carlo Methods
Topics: Use Blackjack to implement first-visit or every-visit MC prediction
- Sutton, Chapter 5.3: Monte Carlo Methods
- CS294 Optimal control and planning — Sergey Levine Video | Slides
Week 3, Jun 18 Imitation Learning with Mujoco
- Supervised learning and imitation (Levine) Video | Slides
- CS294 Imitation Learning Project
Week 4, Jun 25 Policy Gradients
Topics: TD (Temporal Difference), use Cartpole and Humanoid for Policy Gradients
- Sutton Chapter 6: Temporal-Difference Learning
- Deep RL Bootcamp Core Lecture 4a Policy Gradients and Actor Critic — Pieter Abbeel Video | Slides
- Deep RL Bootcamp Core Lecture 4b Pong from Pixels — Andrej Karpathy Video | Slides
- CS294 Policy gradients introduction — Sergey Levine Video | Slides | Policy Gradients Project
- Policy Gradient Algorithms — Lilian Weng Blog
- Sutton Chapter 13.5: Actor-Critic Methods
- CS294 Actor-critic introduction — Sergey Levine Video | Slides
Week 5, Jul 2 Deep Q Learning, DQN, Rainbow
- Sutton Chapter 16.5: DQN
- Deep RL Bootcamp Core Lecture 3 DQN + Variants — Vlad Mnih Video | Slides
- Deep RL Bootcamp Lab 3: Deep Q-Learning. You will implement the DQN algorithm and apply it to Atari games.
- CS294 Neural networks review (Achiam) Video | Slides
- CS294 Advanced Q-learning algorithms — Sergey Levine Video | Slides | DQN Project
Week 6, Jul 9 Model-based RL
- Deep RL Bootcamp Core Lecture 9 Model-based RL — Chelsea Finn Video | Slides
- CS294 Learning dynamical systems from data — Sergey Levine Video | Slides
- CS294 Learning policies by imitating optimal controllers — Sergey Levine Video | Slides
- CS294 Advanced model learning and images — Chelsea Finn Video | Slides
- CS294 Connection between inference and control — Sergey Levine Video | Slides
- CS294 Model Based RL Project
Week 7, Jul 16 Advanced Policy Gradients
Topics: Advanced Policy Gradients: Natural Policy, PPO (Use Roboschool instead of Mujoco license)
- Deep RL Bootcamp Core Lecture 5 Natural Policy Gradients, TRPO, and PPO — John Schulman Video | Slides
- Deep RL Bootcamp Lab 4: Policy Optimization Algorithms. You will implement various policy optimization algorithms, including policy gradient, natural policy gradient, trust-region policy optimization (TRPO), and asynchronous advantage actor-critic (A3C). You will apply these algorithms to classic control tasks, Atari games, and roboschool locomotion environments.
- CS294 Learning policies by imitating optimal controllers — Sergey Levine Video | Slides
Week 8, Jul 23 Inverse RL
Topics: GAIL
- CS294 Inverse reinforcement learning — Sergey Levine Video | Slides
- Algorithms for Inverse Reinforcement Learning PDF
- Learning Robust Rewards with Adversarial Inverse Reinforcement Learning PDF
- Maximum Entropy Inverse Reinforcement Learning PDF
- Maximum Entropy Deep Inverse Reinforcement Learning PDF
- Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization PDF
- Generative Adversarial Imitation Learning PDF