Learn how to use NumPy for simulations in this tutorial by Alvaro Fuentes, a data scientist with an M.S. in quantitative economics and applied mathematics with more than 10 years of experience in analytical roles.
NumPy, also known as Python’s vectorization solution, is the fundamental package for performing scientific computations with Python. It gives you the ability to create multidimensional array objects and perform faster mathematical operations than you can with base Python. It is the basis of most of Python’s Data Science ecosystem. Most of the other libraries that you use in data analytics with Python, such as scikit-learn and pandas, rely on NumPy.
Now here’s how to use NumPy in a real-world scenario. Here are two examples of simulations using NumPy, and in the process, you’ll also learn about other operations that you can do with arrays.
Coin flips
You can look into a coin flip or a coin toss simulation using NumPy. For this purpose, use the
randint function that comes in the random submodule in NumPy. This function takes the
low ,
high , and
size arguments, which will be the range of random integers that you want for the output. So, in this case, you want the output to be either 0 or 1, so the value for
low will be 1 and
high will be 2 but not including 2. Here, the
size argument will define the number of random integers you want for the output, that is, the number of coins you’ll flip, in this case:Assign 0 as tails, 1 as heads, and the
size argument as 1 , since you’ll be flipping one coin. You’ll get a different result every time you run this simulation.
Now, take another simulation where you want to throw 10 coins at a time. Here, all you have to do is change the value of the last argument to size=10 . To get the total number of heads, you have to sum all the elements in the experiment output array:
Like the previous simulation, you’ll get random output every time you run the simulation. If you want to perform this experiment, many times, say 10,000 times, you can do it quite easily using NumPy. Create a coin_matrix simulation to find out the distribution of the number of heads when throwing 10 coins at a time. You can use the same function, randint, with the same arguments, 0 and 2, but this time you want the size to be a two-dimensional array, so assign the size=(10000,10) argument. However, since here you can’t view the 10,000 rows-matrix on the screen, create a smaller matrix to display on the coin_matrix[:5,:] output:
When you run the cell, you’ll get the first five rows of the matrix, and the result will be different every time you run the simulation. Note that the first five rows are the first five results of the 10 coins that you flip, out of the 10,000 results in the actual matrix with 10,000 rows.
To calculate how many heads you got in every experiment, you can use the sum attribute, but in this case, you want to sum all the rows. To sum all the rows in NumPy, use the additional arguments, axis and set axis=1; this will give you an array with a count of the number of heads you get in every experiment:
In the preceding screenshot, you called for the first 25 elements in the array, which contain the number of heads in every experiment. NumPy also provides arrays with some useful methods for performing statistics, such as mean, median, minimum, maximum, and standard deviation. Using the mean() method, you will get the mean or the average of heads in all the experiments. The median() method will give you the median value for the total of heads from the experiments. You can use the min() and max() methods to get the minimum and maximum number of heads that you can get in your experiment. The std() method will calculate the standard deviation of the array counts.
Note that the output details for this section will be different every time you run the experiment. So do not be dismayed if your output doesn’t match those mentioned earlier.
Now, if you want to know the distribution of the number of heads you get in the experiment, you can use the bincount function. If you run the cell, you’ll get an array of numbers that gives the number of heads for the experiments, starting from 0 to 10 as shown in the following screenshot:
The following code is just regular Python code that gives a detailed overview of the values for the distribution of the number of heads that you get in the experiment:
The preceding screenshot depicts the details of the experiment ran earlier; you can see that you got 0 heads 5 times, 1 heads 94 times, and so on, and also the percentages.
Simulating stock returns
Now here’s another simulation example from the field of finance using the matplotlib NumPy library. Let’s say you want to model the returns for a stock with the normal distribution. So, here you can use the normal function to produce random numbers that are normally distributed. In the normal function, you have the loc parameter, the scale parameter, also known as the standard deviation, and the parameter that holds the value of random numbers that you want. Here, the random parameter is the number of days in a trading year:
When you run the cell or the simulation, you get an array of values that are the returns for the first 20 days. You will also get some negative and positive returns, just like in normal stocks. Now, suppose you have initial_price as 100; to calculate all the prices for all the following days, you can apply initial_price times the exponential function of the cumulative sum of the returns.
Here, you must have a little background in finance to really understand this. The goal is not for you to understand how to perform simulations in finance but to show you how easy it is to perform simulations using NumPy. Do some plots that will project the simulation of the stock using NumPy:
In the preceding screenshot, the stock started at a price of 100, and the evolution is plotted in the simulation. You have the same code, everything in one cell, and every time you run the cell, you’ll get a different simulation.
If you found this article interesting, you can explore Alvaro Fuentes’ Become a Python Data Analyst to enhance your data analysis and predictive modeling skills using popular Python tools. This book introduces Python’s most essential tools and libraries necessary to work with the data analysis process, right from preparing data to performing simple statistical analyses and creating meaningful data visualizations.