OpenAI Scholars Final Project
Holly Grimm email@example.com
August 31, 2018
OpenAI Scholars Final Project
I’m training on eight art attributes, six of the attributes have numerical values between 1 and 10. The other two attributes are primary color composed of 13 classes and color harmony which has 6 classes. Mean squared error is used on the numerical values to calculate loss, and categorical cross-entropy on the categorical attributes.
As detailed last week, I’m fine tuning ResNet50 for art attributes. Here is the Keras code: https://github.com/hollygrimm/art-composition-cnn.
The most difficult part was creating a Keras Data Generator that could handle multiple outputs of target (label) data for the model.fit-generator() call. The Keras documentation describes the y argument to the fit-generator as a “list of Numpy arrays (if the model has multiple outputs)”
My Deep CNN for learning painting composition attributes is based on the paper, Learning Photography Aesthetics with Deep CNNs by Malu et al. For photography, they are training on the aesthetics and attribute database (AADB) which has the following attributes: Balancing Element, Content, Color Harmony, Depth of Field, Light, Object Emphasis, Rule of Thirds, and Vivid Color.
As a painter and software developer interested in Machine Learning, I have trained several models on my paintings, creating complex and interesting artistic outputs from CycleGAN and Mixture Density Networks (modeling distributions of data). For my final project, I would like to expand on my generative art projects by incorporating domain knowledge based on the rules of art evaluation.
Most of the recent successes in reinforcement learning comes from applying a more sophisticated optimization problem to policy gradients. This week I learned about advanced policy gradient techniques using algorithms such as Natural Policy Gradients, TRPO, and A2C.
I implemented Lab 4 provided by the Deep RL Bootcamp  . My code is here [GitHub Source]
This week I learned about Model-based RL where a model of the dynamics of the environment is used to make predictions. Previous algorithms that I’ve studied have been model-free where a policy or value function is being optimized. Instead, Model-based RL predicts what the environment looks like, and it can create a model that is independent of the task you are trying to achieve. The dynamics model can be implemented using a Gaussian Process, a Neural Network, or other methods.
The first part of this week was spent working on homework 3 for CS294 "Using Q-Learning with convolutional neural networks"  for playing Atari games, also known as Deep Q Networks (DQN). (Source on GitHub)
Like last week, training was done on Atari Pong. I was able to improve my +6 score using Policy Gradients to receive a +20 reward after 5 million games with DQN:
The first part of my week was spent working on the 2nd homework for CS294, Policy Gradients. Source code: https://github.com/hollygrimm/cs294-homework/tree/master/hw2
The Policy Gradients algorithm determines the optimal policy using a parameterized Neural Network (NN) instead of a value function or action function.
Policy Gradient training was done on OpenAI’s Gym Environments: CartPole-v0 and InvertedPendulum-v0.