The first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning was presented by Mnih et. al..
The model was a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards.
Classic DQN represents a rather simplistic approach, but at the same time, the recipe is an excellent starting point for diving into Deep Reinforcement Learning.
pip install -U neuromationneuro login
git clone email@example.com:neuromation/ml-recipe-mountain-car.gitcd ml-recipe-mountain-carmake setupmake jupyter