Python – Balancing CartPole with Machine Learning

This article will show you how to solve the CartPole balancing problem. The CartPole is an inverted pendulum, where the pole is balanced against gravity. Traditionally, this problem is solved by control theory, using analytical equations. However, in this article, you’lllearn to solve the problem with machine learning.

OpenAI Gym

OpenAI is a non-profit organization dedicated to researching artificial intelligence, and the technologies developed by OpenAI are free for anyone to use.


Gym provides a toolkit to benchmark AI-based tasks. The interface is easy to use. The goal is to enable reproducible research. An agent can be taught inside the gym, and it can learn activities such as playing games or walking. An environment is a library of problems.

The standard set of problems presented in the gym is as follows:

  • CartPole
  • Pendulum
  • Space Invaders
  • Lunar Lander
  • Ant
  • Mountain Car
  • Acrobot
  • Car Racing
  • Bipedal Walker

Any algorithm can work out in the gym by training for these activities. All of the problems have the same interface. Therefore, any general reinforcement learning algorithm can be used through the interface.

Installating Gym

The primary interface of the gym is used through Python. Once you have Python3 in an environment with the pip  installer, the gym can be installed as follows:

sudopip install gym

Advanced users who want to modify the source can compile from the source using the following commands:

A new environment can be added to the gym  with the source code. There are several environments that need more dependencies. For macOS, install the dependencies using the following command:

For Ubuntu, use the following commands:

Once the dependencies are present, install the complete gym  as follows:

pip install 'gym[all]'

This will install most of the environments that are required.

Running an environment

Any gym  environment can be initialized and run using a simple interface. Start by importing the gym library, as follows:

  1. First, import the gymlibrary: import gym
  1. Next, create an environment by passing an argument to gym.make. In the following code, CartPole is used as an example environment = gym.make('CartPole-v0')
  1. Next, reset the environment: environment.reset()
  1. Then, start an iteration and render the environment:

Also, change the action space at every step, to see CartPole moving. Running the preceding program should produce a visualization. The scene should start with a visualization, as follows:

The preceding image is called a CartPole. The CartPole is made up of a cart that can move horizontally and a pole that can move rotationally, with respect to the center of the cart.The pole is pivoted to the cart. After some time, you will notice that the pole is falling to one side, as shown in the following image:

After a few more iterations, the pole will swing back, as shown in the following image. All movements are constrained by the laws of physics. The steps are taken randomly:

Other environments can be seen in a similar way, by replacing the argument of the gym environment, such as MsPacman-v0  or MountrainCar-v0 .

Markov models

The problem is set up as a reinforcement learning problem, with a trial and error method. The environment is described using state_valuesstate_values, and the state_values  are changed by actions. The actions are determined by an algorithm, based on the current state_value, in order to achieve a particular state_value that is termed a Markov model.

In an ideal case, the past state_values does have an influence on future state_values, but here, you assume that the current state_values has all of the previous state_values encoded. There are two types of state_values; one is observable and the other is non-observable. The model has to take non-observable state_values into account, as well. That is called a Hidden Markov model.


At each step of the cart and pole, several variables can be observed, such as the position, velocity, angle, and angular velocity. The possible state_values of the cart are moved right and left:

  1. state_values: Four dimensions of continuous values.
  2. Actions : Two discrete values.
  3. The dimensions, or space, can be referred to as the  state_values space and the action space. Start by importing the required libraries, as follows:

  1. Next, make the environment for playing CartPole, as follows:

environment = gym.make('CartPole-v0')

  1. Define the number of buckets and the number of actions, as follows:

  1. Define the state_value_bounds, as follows:

  1. Next, define the action_index, as follows:

action_index = len(no_buckets)

  1. Now, define the q_value_table, as follows:

q_value_table = np.zeros(no_buckets + (no_actions,))

  1. Define the minimum exploration rate and the minimum learning rate:

  1. Define the maximum episodes, the maximum time steps, the streak to the end, the solving time, the discount, and the number of streaks, as constants:

  1. Define the selectaction that can decide the action, as follows:

  1. Now, select the explorertate, as follows:

  1. Select the learning rate, as follows:

  1. Next, bucketize the state_value, as follows:

  1. Train the episodes, as follows:

  1. Print all relevant metrics for the training process, as follows:

  1. After training for a period of time, the CartPolewill be able to balance itself, as shown in the following image:

You have successfully learned a program that will stabilize the CartPoleusing a trial and error approach.

If you found this article interesting, you can explore Python Reinforcement Learning Projects to implement state-of-the-art deep reinforcement learning algorithms using Python and its powerful libraries. Python Reinforcement Learning Projects will help you hands-on experience with eight reinforcement learning projects, each addressing different topics and/or algorithms.

Sean Saito is the youngest ever Machine Learning Developer at SAP and the first bachelor hire for the position. He currently researches and develops machine learning algorithms that automate financial processes. He graduated from Yale-NUS College in 2017 with a Bachelors of Science (with Honours), where he explored unsupervised feature extraction for his thesis. Having a profound interest in hackathons, Sean represented Singapore during Data Science Game 2016, the largest student data science competition. Before attending university in Singapore, Sean grew up in Tokyo, Los Angeles, and Boston.

Tagged with: ,