My Deep CNN for learning painting composition attributes is based on the paper, Learning Photography Aesthetics with Deep CNNs by Malu et al. For photography, they are training on the aesthetics and attribute database (AADB) which has the following attributes: Balancing Element, Content, Color Harmony, Depth of Field, Light, Object Emphasis, Rule of Thirds, and Vivid Color.
As a painter and software developer interested in Machine Learning, I have trained several models on my paintings, creating complex and interesting artistic outputs from CycleGAN and Mixture Density Networks (modeling distributions of data). For my final project, I would like to expand on my generative art projects by incorporating domain knowledge based on the rules of art evaluation.
- Behavioral Cloning
- a supervised learning problem that maps state/action pairs to policy
- requires a large number of expert trajectories
- copies actions exactly, even if they are of no importance to the task
- Inverse RL
- learns the reward function from expert trajectories, then derives the optimal policy
- expensive to run
- indirectly learns optimal policy from the reward function
Most of the recent successes in reinforcement learning comes from applying a more sophisticated optimization problem to policy gradients. This week I learned about advanced policy gradient techniques using algorithms such as Natural Policy Gradients, TRPO, and A2C.
I implemented Lab 4 provided by the Deep RL Bootcamp  . My code is here [GitHub Source]
Model Predictive Control and HalfCheetah
This week I learned about Model-based RL where a model of the dynamics of the environment is used to make predictions. Previous algorithms that I’ve studied have been model-free where a policy or value function is being optimized. Instead, Model-based RL predicts what the environment looks like, and it can create a model that is independent of the task you are trying to achieve. The dynamics model can be implemented using a Gaussian Process, a Neural Network, or other methods.
The first part of this week was spent working on homework 3 for CS294 "Using Q-Learning with convolutional neural networks"  for playing Atari games, also known as Deep Q Networks (DQN). (Source on GitHub)
Like last week, training was done on Atari Pong. I was able to improve my +6 score using Policy Gradients to receive a +20 reward after 5 million games with DQN:
The first part of my week was spent working on the 2nd homework for CS294, Policy Gradients. Source code: https://github.com/hollygrimm/cs294-homework/tree/master/hw2
The Policy Gradients algorithm determines the optimal policy using a parameterized Neural Network (NN) instead of a value function or action function.
Policy Gradient training was done on OpenAI’s Gym Environments: CartPole-v0 and InvertedPendulum-v0.
This week I worked on Homework 1: Imitation Learning from the Fall 2017 CS294 course at Berkeley. Professor Levine is an amazing lecturer and the information he covers in one lecture is quite dense.
Imitation Learning is a form of Supervised machine learning for behavior. For this exercise, we were supplied with expert policies for six different OpenAI Gym Mujoco environments. Each environment has different observation and action spaces:
This week I learned about the Reinforcement Learning algorithms called Monte Carlo (MC) methods. Most of my instruction came from Chapter 5 of Reinforcement Learning: An Introduction by Sutton and Barto.