Deep Reinforcement Learning with PyChrono and Tensorflow



Deep Reinforcement Learning (DRL) consits in using Reinforcement Learning to train Deep Neural Network. In the last few years it has been applied with success to various robotic control tasks. The main advantage of this approach is its ability to deal with unstructured and mutable environemnts, while classical robotic control oftens fails in these challenges. To train a NN with DRL several interactions with the envrironment are needed. For this reason physical engines offer a valuable help, allowing to train the agent in a virtual environment instead of training it directly in the real world, reducing time and risks of the operation.


We will train a Neural Network to solve robotic tasks using virtual training environment created with PyChrono. The demo contains virtual enviroments and a learning model created with Tensorflow.


We provide two sample environments for robotic control.


Reverse pendulum, the goal is to balance a pole on a cart. 1 action (force along the z axis) and 4 observations (position and speed of cart and pole).


A 4-legged walker, the goal is learning to walk straight as fast as possible. 8 actions (motor torques) and 30 observations.

DL algorithm

To train the Neural Networks to solve the tasks we used a reinforcement learning algorithm known as Proximal Policy Optimization (PPO). PPO is an on-policy actor-critic algorithm, thus you will find 2 NNs: the first one given the state prescribes an action (Policy), the second given the state evaluates the value function (VF).

The policy and value function codes are in the and respectively.

How to run examples

Make sure that you are using the right Python interpreter (the one with Pychrono an Tensorflow installed). Then simply run the script with the needed keyboard arguments


Train to solve the inverted pendulum over 1000 episodes:

1 python ./ ChronoPendulum -n 1000

Learn to make the 4-legged ant walk over 20000 episodes:

1 python ./ ChronoAnt -n 20000

Alternatively, launch the demo from your favourite IDE, but remember to add the required arguments.

List of command line arguments

Besides the environment name and the number of episodes, there are some other arguments, mainly to hand-tune learning parameters. ** –renderON/–renderOFF ** toggles on and off the simulation render. Consider that visualizing the render will slow down the simulation and consequentially the learning process.

  • Environment name : 'env_name'
  • Number of episodes: '-n', '–num_episodes', default=1000
  • ** –renderON / –renderOFF **
  • Discount factor: '-g', '–gamma', default=0.995
  • Lambda for GAE: -l, –lam, default=0.98
  • Kullback Leibler divergence target value: -k, –kl_targ, default=0.003
  • Batch size: -b, –batch_size, default=20

Saving and restoring

NN parameters and the other TF variables are stored inside the Policy and VF directories, while the scaler means and variances are stored in the scaler.dat saved numpy array. These files and folders are stored and used to restore a previous checkpoint.


To test the Policy without further improving it execute Set ** –VideoSave ** to save screenshots from the render.