Week 12 - Training Art Composition Attributes

Submitted by hollygrimm on Fri, 08/24/2018 - 15:03

I’m training on eight art attributes, six of the attributes have numerical values between 1 and 10. The other two attributes are primary color composed of 13 classes and color harmony which has 6 classes. Mean squared error is used on the numerical values to calculate loss, and categorical cross-entropy on the categorical attributes.

Week 11 - Tuning ResNet50 Code and Multilabel Attributes

Submitted by hollygrimm on Fri, 08/17/2018 - 14:43

As detailed last week, I’m fine tuning ResNet50 for art attributes. Here is the Keras code: https://github.com/hollygrimm/art-composition-cnn.

Data Generator with Multiple Outputs

The most difficult part was creating a Keras Data Generator that could handle multiple outputs of target (label) data for the model.fit-generator() call. The Keras documentation describes the y argument to the fit-generator as a “list of Numpy arrays (if the model has multiple outputs)”

Week 10 - OpenAI Finetuning ResNet50 for Art Composition Attributes

Submitted by hollygrimm on Sun, 08/12/2018 - 19:43

My Deep CNN for learning painting composition attributes is based on the paper, Learning Photography Aesthetics with Deep CNNs by Malu et al. For photography, they are training on the aesthetics and attribute database (AADB) which has the following attributes: Balancing Element, Content, Color Harmony, Depth of Field, Light, Object Emphasis, Rule of Thirds, and Vivid Color.

Week 9 - OpenAI Scholars Final Project Proposal and Dataset Attributes - Generative Art with Domain Knowledge

Submitted by hollygrimm on Mon, 08/06/2018 - 09:08

Introduction

As a painter and software developer interested in Machine Learning, I have trained several models on my paintings, creating complex and interesting artistic outputs from CycleGAN and Mixture Density Networks (modeling distributions of data). For my final project, I would like to expand on my generative art projects by incorporating domain knowledge based on the rules of art evaluation.

Week 8 - Generative Adversarial Imitation Learning (GAIL)

Submitted by hollygrimm on Mon, 07/30/2018 - 07:57
Imitation Learning or learning from expert trajectories can be implemented two different ways:
  • Behavioral Cloning
    • a supervised learning problem that maps state/action pairs to policy
    • requires a large number of expert trajectories
    • copies actions exactly, even if they are of no importance to the task
  • Inverse RL
    • learns the reward function from expert trajectories, then derives the optimal policy
    • expensive to run
    • indirectly learns optimal policy from the reward function

Week 7 - Natural Policy Gradients, TRPO, A2C

Submitted by hollygrimm on Fri, 07/20/2018 - 16:28

Most of the recent successes in reinforcement learning comes from applying a more sophisticated optimization problem to policy gradients. This week I learned about advanced policy gradient techniques using algorithms such as Natural Policy Gradients, TRPO, and A2C.

I implemented Lab 4 provided by the Deep RL Bootcamp [1] [3]. My code is here [GitHub Source]

Week 6 - Model-based RL

Submitted by hollygrimm on Fri, 07/13/2018 - 14:06

Model Predictive Control and HalfCheetah

This week I learned about Model-based RL where a model of the dynamics of the environment is used to make predictions. Previous algorithms that I’ve studied have been model-free where a policy or value function is being optimized. Instead, Model-based RL predicts what the environment looks like, and it can create a model that is independent of the task you are trying to achieve. The dynamics model can be implemented using a Gaussian Process, a Neural Network, or other methods.

Week 5 - Deep Q Networks and Rainbow Algorithm

Submitted by hollygrimm on Mon, 07/09/2018 - 09:13

The first part of this week was spent working on homework 3 for CS294 "Using Q-Learning with convolutional neural networks" [4] for playing Atari games, also known as Deep Q Networks (DQN). (Source on GitHub)

Like last week, training was done on Atari Pong. I was able to improve my +6 score using Policy Gradients to receive a +20 reward after 5 million games with DQN:

Week 4 - Policy Gradients on Atari Pong and Mujoco

Submitted by hollygrimm on Sat, 06/30/2018 - 09:50

The first part of my week was spent working on the 2nd homework for CS294, Policy Gradients[1]. Source code: https://github.com/hollygrimm/cs294-homework/tree/master/hw2

The Policy Gradients algorithm determines the optimal policy using a parameterized Neural Network (NN) instead of a value function or action function.

Policy Gradient training was done on OpenAI’s Gym Environments: CartPole-v0 and InvertedPendulum-v0.