Where I clarify my own thinking about GPT-3 outputs and why I think it’s revolutionary.
I’m training on eight art attributes, six of the attributes have numerical values between 1 and 10. The other two attributes are primary color composed of 13 classes and color harmony which has 6 classes. Mean squared error is used on the numerical values to calculate loss, and categorical cross-entropy on the categorical attributes.
As detailed last week, I’m fine tuning ResNet50 for art attributes. Here is the Keras code: https://github.com/hollygrimm/art-composition-cnn.
Data Generator with Multiple Outputs
The most difficult part was creating a Keras Data Generator that could handle multiple outputs of target (label) data for the model.fit-generator() call. The Keras documentation describes the y argument to the fit-generator as a “list of Numpy arrays (if the model has multiple outputs)”
My Deep CNN for learning painting composition attributes is based on the paper, Learning Photography Aesthetics with Deep CNNs by Malu et al. For photography, they are training on the aesthetics and attribute database (AADB) which has the following attributes: Balancing Element, Content, Color Harmony, Depth of Field, Light, Object Emphasis, Rule of Thirds, and Vivid Color.
As a painter and software developer interested in Machine Learning, I have trained several models on my paintings, creating complex and interesting artistic outputs from CycleGAN and Mixture Density Networks (modeling distributions of data). For my final project, I would like to expand on my generative art projects by incorporating domain knowledge based on the rules of art evaluation.
- Behavioral Cloning
- a supervised learning problem that maps state/action pairs to policy
- requires a large number of expert trajectories
- copies actions exactly, even if they are of no importance to the task
- Inverse RL
- learns the reward function from expert trajectories, then derives the optimal policy
- expensive to run
- indirectly learns optimal policy from the reward function
Most of the recent successes in reinforcement learning comes from applying a more sophisticated optimization problem to policy gradients. This week I learned about advanced policy gradient techniques using algorithms such as Natural Policy Gradients, TRPO, and A2C.
I implemented Lab 4 provided by the Deep RL Bootcamp  . My code is here [GitHub Source]
Model Predictive Control and HalfCheetah
This week I learned about Model-based RL where a model of the dynamics of the environment is used to make predictions. Previous algorithms that I’ve studied have been model-free where a policy or value function is being optimized. Instead, Model-based RL predicts what the environment looks like, and it can create a model that is independent of the task you are trying to achieve. The dynamics model can be implemented using a Gaussian Process, a Neural Network, or other methods.
The first part of this week was spent working on homework 3 for CS294 "Using Q-Learning with convolutional neural networks"  for playing Atari games, also known as Deep Q Networks (DQN). (Source on GitHub)
Like last week, training was done on Atari Pong. I was able to improve my +6 score using Policy Gradients to receive a +20 reward after 5 million games with DQN: