Learn some of the fundamental concepts of concurrent programming in this article by Quan Nguyen, a Python enthusiast and data scientist.
It is estimated that the amount of data that needs to be processed by computer programs doubles every two years. The International Data Corporation (IDC), for example, estimates that by 2020, there will be 5,200 GB of data for every person on earth. With this staggering volume of data come insatiable demands for computing power, and while numerous computing techniques are being developed and utilized every day, concurrent programming remains one of the most prominent ways to effectively and accurately process data.
This article will serve as a comprehensive introduction to various advanced concepts in concurrent engineering and programming in Python. You can download the GitHub repository for this article at https://github.com/PacktPublishing/Mastering-Concurrency-in-Python. Time to get started!
Multithreading
Multithreading implements more than one thread to exist and execute in a single process, simultaneously. By allowing multiple threads to access shared resources/contexts and be executed independently, this programming technique can help applications to gain speed in the execution of independent tasks.
An example in Python
To illustrate the concept of running multiple threads in the same process, here’s a quick example in Python. If you have already downloaded the code for this book from the GitHub page, go ahead and navigate to the Chapter03 folder. Take a look at the Chapter03/my_thread.py file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
# Chapter03/my_thread.py import threading import time class MyThread(threading.Thread): def __init__(self, name, delay): threading.Thread.__init__(self) self.name = name self.delay = delay def run(self): print('Starting thread %s.' % self.name) thread_count_down(self.name, self.delay) print('Finished thread %s.' % self.name) def thread_count_down(name, delay): counter = 5 while counter: time.sleep(delay) print('Thread %s counting down: %i...' % (name, counter)) counter -= 1 |
In this file, you’ll use the threading module from Python as the foundation of the MyThread class. Each object of this class has a name and delay parameter. The function run(), which is called as soon as a new thread is initialized and started, prints out a starting message, and, in turn, calls the thread_count_down() function. This function counts down from the number 5 to the number 0, while sleeping between iterations for a number of seconds, specified by the delay parameter.
The point of this example is to show the concurrent nature of running more than one thread in the same program (or process) by starting more than one object of the MyThread class at the same time. As soon as each thread is started, a time-based countdown for that thread will also start. In a traditional sequential program, separate countdowns will be executed separately, in order (that is, a new countdown will not start until the current one finishes). As you will see, the separate countdowns for separate threads are executed concurrently.
Here’s a look at the Chapter3/example1.py file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
# Chapter03/example1.py from my_thread import MyThread thread1 = MyThread('A', 0.5) thread2 = MyThread('B', 0.5) thread1.start() thread2.start() thread1.join() thread2.join() print('Finished.') |
Here, you’ll initialize and start two threads together, each of which has 0.5 seconds as its delay parameter. Run the script using your Python interpreter, and you should get the following output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
> python example1.py Starting thread A. Starting thread B. Thread A counting down: 5... Thread B counting down: 5... Thread B counting down: 4... Thread A counting down: 4... Thread B counting down: 3... Thread A counting down: 3... Thread B counting down: 2... Thread A counting down: 2... Thread B counting down: 1... Thread A counting down: 1... Finished thread B. Finished thread A. Finished. |
The output tells you that the two countdowns for the threads were executed concurrently; instead of finishing the first thread’s countdown and then starting the second thread’s countdown, the program ran the two countdowns at almost the same time. Without including some overhead and miscellaneous declarations, this threading technique allows almost double improvement in speed for the preceding program.
There is one additional thing that should be taken note of in the preceding output. After the first countdown for number 5, you can see that the countdown of thread B actually got ahead of thread A in execution, even though you know that thread A was initialized and started before thread B. This change actually allows thread B to finish before thread A. This phenomenon is a direct result of concurrency via multithreading; since the two threads were initialized and started almost simultaneously, it was quite likely for one thread to get ahead of the other in execution.
If you were to execute this script many times, it would be quite likely for you to get varying output, in terms of the order of execution and the completion of the countdowns. The following are two pieces of output obtained by executing the script again and again. The first output shows a uniform and unchanging order of execution and completion, in which the two countdowns were executed hand in hand. The second shows a case in which thread A was executed significantly faster than thread B; it even finished before thread B counted to number 1. This variation of output further illustrates the fact that the threads were treated and executed by Python equally.
The following code shows one possible output of the program:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
> python example1.py Starting thread A. Starting thread B. Thread A counting down: 5... Thread B counting down: 5... Thread A counting down: 4... Thread B counting down: 4... Thread A counting down: 3... Thread B counting down: 3... Thread A counting down: 2... Thread B counting down: 2... Thread A counting down: 1... Thread B counting down: 1... Finished thread A. Finished thread B. Finished. |
The following is another possible output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
> python example1.py Starting thread A. Starting thread B. Thread A counting down: 5... Thread B counting down: 5... Thread A counting down: 4... Thread B counting down: 4... Thread A counting down: 3... Thread B counting down: 3... Thread A counting down: 2... Thread B counting down: 2... Thread A counting down: 1... Finished thread A. Thread B counting down: 1... Finished thread B. Finished. |
Multiprocessing
While there are a number of different uses of the term multiprocessing, in the context of concurrency and parallelism, multiprocessing refers to the execution of multiple concurrent processes in an operating system, in which each process is executed on a separate CPU, as opposed to a single process being executed at any given time. By the nature of processes, an operating system needs to have two or more CPUs in order to be able to implement multiprocessing tasks, as it needs to support many processors at the same time and allocate tasks between them appropriately.
Multithreading shares a somewhat similar definition to multiprocessing. Multithreading means that only one processor is utilized, and the system switches between tasks within that processor (also known as time slicing), while multiprocessing generally denotes the actual concurrent/parallel execution of multiple processes using multiple processors.
An example in Python
To illustrate the concept of running multiple processes on one operating system, here’s a quick example in Python. Take a look at the Chapter06/example1.py file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
# Chapter06/example1.py from multiprocessing import Process import time def count_down(name, delay): print('Process %s starting...' % name) counter = 5 while counter: time.sleep(delay) print('Process %s counting down: %i...' % (name, counter)) counter -= 1 print('Process %s exiting...' % name) if __name__ == '__main__': process1 = Process(target=count_down, args=('A', 0.5)) process2 = Process(target=count_down, args=('B', 0.5)) process1.start() process2.start() process1.join() process2.join() print('Done.') |
In this file, your count_down() function takes in a string as a process identifier and a delay time range. It will then count down from 5 to 1 while sleeping between iterations for the number of seconds specified by the delay parameter. The function also prints out a message with the process identifier at each iteration.
The point of this counting-down example is to show the concurrent nature of running separate tasks at the same time, this time through different processes using the Process class from the multiprocessing module. In your main program, initialize two processes at the same time to implement two separate time-based countdowns simultaneously. Similar to how two separate threads would do this, your two processes will carry out their own countdowns concurrently.
After running the Python script, your output should be similar to the following:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
> python example1.py Process A starting... Process B starting... Process B counting down: 5... Process A counting down: 5... Process B counting down: 4... Process A counting down: 4... Process B counting down: 3... Process A counting down: 3... Process B counting down: 2... Process A counting down: 2... Process A counting down: 1... Process B counting down: 1... Process A exiting... Process B exiting... Done. |
The output tells you that the two countdowns from the separate processes were executed concurrently; instead of finishing the first process’ countdown and then starting the second’s, the program ran the two countdowns at almost the same time. Even though processes are more expensive and contain more overhead than threads, multiprocessing also allows double the improvement in terms of speed for programs such as the preceding one.
In multithreading, you saw a phenomenon in which the order of the printed output changed between different runs of the program. Specifically, sometimes process B would get ahead of process A during the countdown and finish before process A, even though it was initialized later. This is, again, a direct result of implementing and starting two processes that execute the same function at almost the same time. By executing the script many times, you will see that it is quite likely for you to obtain changing output in terms of the order of the counting and the completion of the countdowns.
Asynchronous programming
Asynchronous programming is one of the major concepts in concurrency. However, it is quite a complex concept that can be considerably challenging for you to sometimes differentiate it from other programming models.
While providing somewhat similar benefits to those that threading and multiprocessing provide, asynchronous programming is fundamentally different from these two programming models, especially in the Python programming language.
In multiprocessing, multiple copies of your main program—together with its instructions and variables—are created and executed independently across different cores. Threads, which are also known as lightweight processes, operate on the same basis: although the code is not executed in separate cores, independent portions of the code that are executed in separate threads do not interact with one another either.
Asynchronous programming, on the other hand, keeps all of the instructions of a program in the same thread and process. The main idea behind asynchronous programming is to have a single executor to switch from one task to another if it is more efficient (in terms of execution time) to simply wait for the first task while processing the second. This means that asynchronous programming will not take advantage of the multiple cores that a system might have.
An example in Python
Consider how asynchronous programming can improve the execution time of your Python programs. Take a look at the Chapter09/example1.py file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
# Chapter09/example1.py from math import sqrt def is_prime(x): print('Processing %i...' % x) if x < 2: print('%i is not a prime number.' % x) elif x == 2: print('%i is a prime number.' % x) elif x % 2 == 0: print('%i is not a prime number.' % x) else: limit = int(sqrt(x)) + 1 for i in range(3, limit, 2): if x % i == 0: print('%i is not a prime number.' % x) return print('%i is a prime number.' % x) if __name__ == '__main__': is_prime(9637529763296797) is_prime(427920331) is_prime(157) |
Here, you have your familiar prime-checking is_prime() function, which takes in an integer and prints out a message indicating whether that input is a prime number or not. In your main program, call is_prime() on three different numbers. Also, keep a track of how much time it takes for your program to process all three numbers.
Once you execute the script, your output should be similar to the following:
1 2 3 4 5 6 7 |
> python example1.py Processing 9637529763296797... 9637529763296797 is a prime number. Processing 427920331... 427920331 is a prime number. Processing 157... 157 is a prime number. |
You have probably noticed that the program took quite some time to process the first input. Because of the way the is_prime() function is implemented, if the input of the prime number is large, then it takes is_prime() longer to process it. So, since you have a large prime number as the first input, your Python program will hang for a significant amount of time before printing out the output. This typically creates a non-responsive feel for your program, which is not desirable in both software engineering and web development.
To improve the responsiveness of the program, take advantage of the asyncio module, which has been implemented in the Chapter09/example2.py file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
# Chapter09/example2.py from math import sqrt import asyncio async def is_prime(x): print('Processing %i...' % x) if x < 2: print('%i is not a prime number.' % x) elif x == 2: print('%i is a prime number.' % x) elif x % 2 == 0: print('%i is not a prime number.' % x) else: limit = int(sqrt(x)) + 1 for i in range(3, limit, 2): if x % i == 0: print('%i is not a prime number.' % x) return elif i % 100000 == 1: #print('Here!') await asyncio.sleep(0) print('%i is a prime number.' % x) async def main(): task1 = loop.create_task(is_prime(9637529763296797)) task2 = loop.create_task(is_prime(427920331)) task3 = loop.create_task(is_prime(157)) await asyncio.wait([task1, task2, task3]) if __name__ == '__main__': try: loop = asyncio.get_event_loop() loop.run_until_complete(main()) except Exception as e: print('There was a problem:') print(str(e)) finally: loop.close() |
Run the script and you will see an improvement in responsiveness in the printed output:
1 2 3 4 5 6 7 |
> python example2.py Processing 9637529763296797... Processing 427920331... 427920331 is a prime number. Processing 157... 157 is a prime number. 9637529763296797 is a prime number. |
Specifically, while 9637529763296797 was being processed, the program decided to switch to the next inputs. Therefore, the results for 427920331 and 157 were returned before it, hence improving the responsiveness of the program.
If you found this article interesting, you can explore Mastering Concurrency in Python to immerse yourself in the world of Python concurrency and tackle the most complex concurrent programming problems. Mastering Concurrency in Python serves as a comprehensive introduction to various advanced concepts in concurrent engineering and programming.