Week 10 - OpenAI Finetuning ResNet50 for Art Composition Attributes

Submitted by hollygrimm on Sun, 08/12/2018 - 19:43

My Deep CNN for learning painting composition attributes is based on the paper, Learning Photography Aesthetics with Deep CNNs by Malu et al. For photography, they are training on the aesthetics and attribute database (AADB) which has the following attributes: Balancing Element, Content, Color Harmony, Depth of Field, Light, Object Emphasis, Rule of Thirds, and Vivid Color.

OpenAI Scholars Final Project Proposal and Dataset Attributes - Generative Art with Domain Knowledge

Submitted by hollygrimm on Mon, 08/06/2018 - 09:08


As a painter and software developer interested in Machine Learning, I have trained several models on my paintings, creating complex and interesting artistic outputs from CycleGAN and Mixture Density Networks (modeling distributions of data). For my final project, I would like to expand on my generative art projects by incorporating domain knowledge based on the rules of art evaluation.

Week 8: Generative Adversarial Imitation Learning (GAIL)

Submitted by hollygrimm on Mon, 07/30/2018 - 07:57
Imitation Learning or learning from expert trajectories can be implemented two different ways:
  • Behavioral Cloning
    • a supervised learning problem that maps state/action pairs to policy
    • requires a large number of expert trajectories
    • copies actions exactly, even if they are of no importance to the task
  • Inverse RL
    • learns the reward function from expert trajectories, then derives the optimal policy
    • expensive to run
    • indirectly learns optimal policy from the reward function

Week 7: Natural Policy Gradients, TRPO, A2C

Submitted by hollygrimm on Fri, 07/20/2018 - 16:28

Most of the recent successes in reinforcement learning comes from applying a more sophisticated optimization problem to policy gradients. This week I learned about advanced policy gradient techniques using algorithms such as Natural Policy Gradients, TRPO, and A2C.

I implemented Lab 4 provided by the Deep RL Bootcamp [1] [3]. My code is here [GitHub Source]

Week 6: Model-based RL

Submitted by hollygrimm on Fri, 07/13/2018 - 14:06

Model Predictive Control and HalfCheetah

This week I learned about Model-based RL where a model of the dynamics of the environment is used to make predictions. Previous algorithms that I’ve studied have been model-free where a policy or value function is being optimized. Instead, Model-based RL predicts what the environment looks like, and it can create a model that is independent of the task you are trying to achieve. The dynamics model can be implemented using a Gaussian Process, a Neural Network, or other methods.

Week 5: Deep Q Networks and Rainbow Algorithm

Submitted by hollygrimm on Mon, 07/09/2018 - 09:13

The first part of this week was spent working on homework 3 for CS294 "Using Q-Learning with convolutional neural networks" [4] for playing Atari games, also known as Deep Q Networks (DQN). (Source on GitHub)

Like last week, training was done on Atari Pong. I was able to improve my +6 score using Policy Gradients to receive a +20 reward after 5 million games with DQN:

Week 4: Policy Gradients on Atari Pong and Mujoco

Submitted by hollygrimm on Sat, 06/30/2018 - 09:50

The first part of my week was spent working on the 2nd homework for CS294, Policy Gradients[1]. Source code: https://github.com/hollygrimm/cs294-homework/tree/master/hw2

The Policy Gradients algorithm determines the optimal policy using a parameterized Neural Network (NN) instead of a value function or action function.

Policy Gradient training was done on OpenAI’s Gym Environments: CartPole-v0 and InvertedPendulum-v0.

Reinforcement Learning - Imitiation Learning and Mujoco

Submitted by hollygrimm on Fri, 06/22/2018 - 16:16

This week I worked on Homework 1: Imitation Learning from the Fall 2017 CS294 course at Berkeley. Professor Levine is an amazing lecturer and the information he covers in one lecture is quite dense.

Imitation Learning is a form of Supervised machine learning for behavior. For this exercise, we were supplied with expert policies for six different OpenAI Gym Mujoco environments. Each environment has different observation and action spaces: