OpenAI Scholars 2018 Reinforcement Learning Syllabus

Submitted by hollygrimm on Fri, 06/08/2018 - 07:41

 

Primary Resources

Reinforcement Learning: An Introduction, Sutton and Barto

Algorithms for Reinforcement Learning, Csaba Szepesvári

Deep RL Bootcamp 2017

CS294 Fall 2017 UC Berkeley

Week 1, Jun 4: Markov Decision Processes

Topics: Dynamic Programming (Value iteration, Policy iteration, and Q-learning)

Sutton Chapter 3: Markov Decision Processes and Chapter 4: Dynamic Programming

Deep RL Bootcamp Core Lecture 1 Intro to MDPs and Exact Solution Methods -- Pieter Abbeel  Video | Slides

Deep RL Bootcamp Core Lecture 2 Sample-based Approximations and Fitted Learning -- Rocky Duan  Video | Slides

Deep RL Bootcamp Lab 1: Markov Decision Processes. You will implement value iteration, policy iteration, and tabular Q-learning and apply these algorithms to simple environments including tabular maze navigation (FrozenLake) and controlling a simple crawler robot.

CS294 Reinforcement learning introduction -- Sergey Levine  Video | Slides

CS294 Value functions introduction -- Sergey Levine Video | Slides

Introduction to Reinforcement Learning -- Joshua Achiam Slides

Week 2, Jun 11 Monte Carlo Methods

Topics: Use Blackjack to implement first-visit or every-visit MC prediction

Sutton, Chapter 5.3: Monte Carlo Methods

CS294 Optimal control and planning -- Sergey Levine Video | Slides

Week 3, Jun 18 Imitation Learning with Mujoco

Supervised learning and imitation (Levine) Video | Slides

CS294 Imitation Learning Project

Week 4, Jun 25 Policy Gradients

Topics: TD (Temporal Difference), use Cartpole and Humanoid for Policy Gradients

Sutton Chapter 6: Temporal-Difference Learning

Deep RL Bootcamp Core Lecture 4a Policy Gradients and Actor Critic -- Pieter Abbeel Video | Slides

Deep RL Bootcamp Core Lecture 4b Pong from Pixels -- Andrej Karpathy  Video | Slides

CS294 Policy gradients introduction -- Sergey Levine Video | Slides | Policy Gradients Project

Policy Gradient Algorithms -- Lilian Weng Blog

Sutton Chapter 13.5: Actor-Critic Methods

CS294 Actor-critic introduction -- Sergey Levine Video | Slides

Week 5, Jul 2 Deep Q Learning, DQN, Rainbow

Sutton Chapter 16.5:  DQN

Deep RL Bootcamp Core Lecture 3 DQN + Variants -- Vlad Mnih  Video | Slides

Deep RL Bootcamp Lab 3: Deep Q-Learning. You will implement the DQN algorithm and apply it to Atari games.

CS294 Neural networks review (Achiam) Video | Slides

CS294 Advanced Q-learning algorithms -- Sergey Levine  Video | Slides | DQN Project

Week 6, Jul 9 Model-based RL

Deep RL Bootcamp Core Lecture 9 Model-based RL -- Chelsea Finn Video | Slides

CS294 Learning dynamical systems from data -- Sergey Levine  Video | Slides

CS294 Learning policies by imitating optimal controllers -- Sergey Levine  Video | Slides

CS294 Advanced model learning and images -- Chelsea Finn  Video | Slides

CS294 Connection between inference and control -- Sergey Levine  Video | Slides

CS294 Model Based RL Project

Week 7, Jul 16 Advanced Policy Gradients

Topics: Advanced Policy Gradients: Natural Policy, PPO (Use Roboschool instead of Mujoco license)

Deep RL Bootcamp Core Lecture 5 Natural Policy Gradients, TRPO, and PPO -- John Schulman Video | Slides

Deep RL Bootcamp Lab 4: Policy Optimization Algorithms. You will implement various policy optimization algorithms, including policy gradient, natural policy gradient, trust-region policy optimization (TRPO), and asynchronous advantage actor-critic (A3C). You will apply these algorithms to classic control tasks, Atari games, and roboschool locomotion environments.

CS294 Learning policies by imitating optimal controllers -- Sergey Levine  Video | Slides

Week 8, Jul 23 Inverse RL

Topics: GAIL

CS294 Inverse reinforcement learning -- Sergey Levine  Video | Slides

Algorithms for Inverse Reinforcement Learning PDF

Learning Robust Rewards with Adversarial Inverse Reinforcement Learning PDF

Maximum Entropy Inverse Reinforcement Learning PDF

Maximum Entropy Deep Inverse Reinforcement Learning PDF

Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization PDF

Generative Adversarial Imitation Learning PDF