Multithreading – convenient but full of surprises

We already worked with multithreading (see Lesson 5, Retrieving and Handling Market Data with Python, the Universal data connector section), and we found that using multiple threads makes life way easier when we develop modular scalable applications. However, we never explored how multithreading is implemented in Python.

Two concepts are frequently confused: multiprocessing and multithreading. The difference between them is that the former uses the concept of isolated processes, each of them having a global interpreter lock (GIL), thus enabling parallel execution using separate physical or logical processors or processor cores (so-called true parallelism), whereas the latter runs a single process that doesn’t care about the number of processors or cores: it executes threads in small portions, allowing each thread to run for several milliseconds and then switching to another one. Of course, from a human perspective, it does look like processes are running in parallel. In most cases, we don’t even think about which thread is executed at which moment. But when implementing event-driven processes, it becomes critical to know what happens first: for example, if we try to generate an order before market data is received, it may end with an error in the best case.

To learn how real multithreading works, let’s write some simple code with three threads emulating the respective components of our trading application:

from threading import Thread
import time
def t1(): # A thread that emulates data receiving
    while True:
        print('Receive data')
        time.sleep(1)
def t2(): # A thread that emulates trading logic
    while True:
        print('Trading logic')
        time.sleep(1)
def t3(): # A thread that emulates order execution
    while True:
        print('Processing orders')
        time.sleep(1)
thread1 = Thread(target=t1)
thread2 = Thread(target=t2)
thread3 = Thread(target=t3)
thread1.start()
thread2.start()
thread3.start()

Since we start the threads one by one (1, 2, and then 3), we may expect to see messages stating Receive data, Trading logic, and Processing orders and repeating in this same order. However, when we run the code, we will see something different:

Receive data
Trading logic
Processing orders
Receive dataProcessing orders
Trading logic
Processing ordersReceive data
Trading logic
Processing orders
Receive data
Trading logic
Trading logic
Processing orders
Receive data
Trading logicProcessing orders
Receive data
Processing ordersReceive data
Trading logic

We can see that while on average the number of messages of each kind is more or less the same, the order in which they appear is almost random, making the output chaotic. This happens because, by default, no thread has any priority and each runs a small portion as soon as it can.

Of course, such a behavior is not suitable for a trading app: we want to make sure that we first receive a tick, then process it, then generate an order, and finally send it for execution – in this very order and not any other!

There are several solutions to this problem. We will use two: using data streams as events for synching and using threading.Event() objects to switch between threads. We will consider each approach in detail in the upcoming sections.

Let’s start by implementing a version of the trading app that works with live tick data, and then see how we can easily transform it into a powerful backtesting tool (if you don’t clearly remember the meaning of backtesting, just jump back to Lesson 2, Using Python for Trading Strategies, the What is paper trading and backtesting? section).

Multithreading – convenient but full of surprises

Comments

Leave a Reply Cancel reply