Algorithmic Trading and Machine Learning. A short overview (I)

Ioan Moldovan
Software Engineer @ TORA

PROGRAMMING

Transactions done by automated systems on electronic markets increased in volume significantly, over the past few years, getting to the point where they are dwarfing the rest on all major exchanges. As an example, the volume of trades made from co-location (systems located in the proximity of the exchange) went up from 10% in 2010 to over 50% in 2016. Naturally, a question arises: "Why is there so much automated trading"? One of the answers might be that investors are optimistic. The exchange is a zero-sum game and speculative funds end up exchanging profits and losses most of the time. Every year there's "the most successful fund", but there are also many who lose and nobody hears about them. Even the old automated trading flagships, Getco (now KCG Holding), D.E. Shaw or Citadel go through cycles of gains and losses, with the notable exception called Renaissance Technologies, which continues to be a miracle (who knows of what kind). A second factor is the presence of retail investors, the amateur traders, which are a renewable resource every time a new generation matures and hopes to multiply their money. Unfortunately, most of the time, they are a source of profit for algorithmic trading. Another factor that contributes to the success of automated trading is the presence of certain patterns in the price movements on special events, when, although the actual events are unpredictable, the activity following the price can be exploited. Moreover, the presence of other trading algos in this dynamic system with auto-feedback called "exchange" leads to algorithms which inevitably leave a trace of their actions, becoming information producers themselves. This gives rise to new opportunities for strategies which can exploit activity patterns or can bait algos with the purpose of producing predictable reactions, in a meta-algorithmization which can continue infinitely. And, as a last factor, we mention the long term investors, which are actually the most important actors in the market, these being the ones who trade huge volumes and, given their directional trading, the ones who can trigger trends which are exploitable by algorithms.

Even though speculative funds hunt each other and generate a huge volume of transactions in the process, their influence on price and market capitalization is minor. Of course, there are exceptions, like the 2012 Knight Capital demise, when a trading algo (actually a faulty deploy) has big consequences for a short period of time. In the long run, it does not matter, except for that specific company. In the end, algorithmic trading is not the engine of the market, it's just a facilitator for the resource exchange (having a positive role by providing liquidity and doing arbitrage), the producers of "real flow" (mostly mutual and pension funds or banks) who seek long term goals are the ones who dictate the moves. These actors co-exist with speculative funds and produce a predator-prey dynamics as in every other environment where we have variation, selection and a limited quantity of resources, the exchange having a proportion of real flow and speculative flow which is self-adjusting. Any trading algorithm, no matter how performant and sophisticated, is constrained by the fact that the available information is a limited resource which disappears quickly under the competitors' pressure. The dispersion of gains among players can be so high that it gets below the level needed to cover infrastructure and transaction costs. In addition, if speculative trading would become predominant, the real flow would move to another place or the exchange would be regulated to limit the effects of speculative flow.

In the following lines we'll try to see how trading algorithms make decisions, to briefly mention what subset of the Machine Learning universe is used in automated trading (with few practical tips & tricks) and to see how we can validate such an algo. It's a vast subject and the article will become a bit dense in terms, but we hope this will help automated trading practitioners or the ones who are interested in ML, even if the article will be used just as a list of terms to be explored.

Strategy Types

When people say algorithmic trading, they usually think about speculative High Frequency Trading, but it is worth mentioning that those are just a small part of the system, even for hedge funds. An incomplete list of automated strategies could be:

execution algos (VWAP, TWAP, POV etc.), which implement a certain trade-off between the probability to obtain the optimal price and the probability to obtain the target quantity
arbitrage algorithms, which is a very broad term for algos which can go from simple single instrument arbitrage to pairs trading, arbitrage between baskets of stocks and derivatives, index arbitrage, volatility arbitrage etc.
algos based on technical indicators, some more lucky than others, every trader's secret being which one to choose and when (various Moving Averages, MACD, RSI and Bollinger Bands being among the most used)
strategies based on special events, news, or (recently) "sentiment analysis" based on social media feeds, done using machine learning
market making
market microstructure strategies, which try to exploit the way that trading is implemented for a specific exchange, either on a logical level or technically
manipulations and algo hunters

The exchanges have several working modes, the so called "normality" periods, without any sudden jumps in prices, the special events (such as the 2011 Japan earthquake) and expected events, when the time for something to happen is known, but not the outcome (such as interest rates announcements made by central banks). The distinction between special events, which are not predictable, and events which are pending and known to happen is important. Most of the algorithmic trading algos are built to optimize their behavior when the market is behaving normally or there are known events, and they try to exit any position minimizing the losses when special events occur. Then, there's a subcategory of algos trying to exploit the behavior during or right after the announced events. The challenge is not only to react fast, but also to take precaution to avoid being baited by other algos, which is the reason why it's important to randomize your behavior on the market if you code an algo. It's not just that other players can reverse what the algo is doing, some might also learn how to fool the trading algo and take advantage of its implementation. Finally, there are algos which can make a lot a profit in a very short time, but also involve a higher risk: algos who wait in stand-by most of the time, listening the market, and taking action only after unexpected events happen, trying to exploit the behavior that unfolds after significant jumps in price.

Figure 1. a) response times after price jumps b) Knight Capital event

In Figure 1.a) we show the activity of automated algos during a day, after price move events. We can see several types of actors: the activity at 1 millisecond after the event is most probably part of the events that triggered the price changes, then at 4-6 milliseconds we have the trading algos in co-location, which act significantly faster than the rest (being in the immediate proximity of the exchange), followed by a spike starting at 10-15ms, which decreases slowly to hundreds of milliseconds. We can also see some time-based algos which react every 50ms. To put it on scale, human reaction time has a distribution peak somewhere in the 250-300ms area.

In Figure 1.b) we can see a special event (meltdown caused by a trading algo misbehavior) that exemplifies how the algo can have a huge effect on short term, but, by the end of the day, its impact on the market is already fading.

How are decisions made in trading algorithms?

When building trading strategies, the purpose is to find information and translate it into money. Since the amount of available information is limited, the profit depends on how fast the algos act and on how accurate its predictions are, so it acts only when there's a chance for profit. This is pretty much a tautology, but it describes well that there's always going to be a trade-off between speed and complexity. All strategies reply on data: be it information from outside (news, analysis, knowledge about economy, society etc.), or information taken from the exchange (by analyzing historical behavior of prices, volumes or trading intentions taken from the so-called order book). And the majority of speculative strategies are based mainly on the information from inside the system. Given the technological advances, the response times are smaller and smaller, and the faster the algos react, the more quickly the information vanishes. This gets us closer to the Efficient Market Hypothesis (EMH), which states that any existing information will be instantly reflected in price, making its movement a random walk, thus impossible to predict. This race is never-ending and can also lead to unintended consequences: any decision taken by an algo introduces noise in a system with auto-feedback. Any pseudo-random decision can get amplified by other algos making a decision to act in the same direction, producing a self-fulfilling prophecy and making prices jump significantly in the absence of information (but creating some in the process). Thus, there will always be room for algos to evolve.

What is the input data used for trading algos? Choosing the source depends mainly on physical constraints, on what data feeds are available. Far from being a domain where "cool things" are the norm, algorithmic trading actually requires a huge effort to find data, gather it, clean it and synchronize it. Very often, small glitches or processing issues can produce major discrepancies between theoretical analysis and what will happen for real in the market. Besides needing to phase out any errors in the data ("garbage in, garbage out" applies very well in this domain), a lot of attention needs to be put on data timestamps, time synchronization between machines, missing data, outliers caused by technical issues or generated by errors in the code. A simple off by one error in a vector can lead to "look into the future" type of errors, with disastrous effects on the correctness of the conclusions produced by machine learning algos.

It is not the purpose of this article to describe metrics in detail (and very often the industry is secretive about those aspects, although almost everybody is using the same things), but we will give very few examples, to show what type of data is used by trading algos. Using unprocessed data (like the price) in machine learning algos is pretty often counter-productive. Besides the task of making the classification or the regression, we also pass the task of extracting structure and features from the input data, thing which is very hard, given the limitations of the algorithms or the data. Any insight or information given from the outside, any derived or composite metrics built which projects some human knowledge into it is precious. We are far from an AI which can have important intuitions, especially given the fact that market data has a small signal to noise ratio, reason why human reasoning is not substituted by processing power or algorithms complexity.

As already mentioned, the movement of price is, in theory, random. It is important that the movement is a random walk and not the price itself. This movement is easily modeled as a return, the percentage difference between two moments of time, and this return has several properties which make it well behaved to be used as an input feature for machine learning algos (either as a discrete return or, preferably, compounded): it's mean is close to 0 in general and it has a distribution which is close to normal. Why do returns follow a [close to] normal distribution? At any T time, there are many independent factors that come into play, each one having some effect on the price. The change, at a T+1 moment, which is the sum of the changes produced by each individual factor, behaves like a normal deviate, a random variable with a Gaussian distribution. To draw a parallel with rolling dices, assuming we have three factors and each one can contribute with a change from 1 to 6, for dices rolled at the same time, there's a small probability that their sum will be 1+1+1=3 or 6+6+6=18, as there is only one possible arrangement in which this can happen. However, for 11, there is a multitude of combinations, 1+1+9, 1+2+8 etc. Of course, prices are different in many ways: the factors are not always independent; their changes have no lower or higher bound. The change triggered by a factor is not always instantaneous and can be rolled out in time, all those making the distribution "imperfect".

Figure 2. a) Sample of price and returns divergent trends, b) Distribution of returns

Another very important metric is the so-called volume imbalance, representing the instantaneous difference between the sell and buy intention, centered around the best offer to sell (best ask) and the best offer to buy (best bid). This imbalance can be computed in many ways, depending on the number of levels in the order book included, and the way they are weighted or decayed. Of course, as any other metric, analyzing it in short, medium and long term windows of time, as well as verifying temporal trends or inflection points can give the algos extra-information, and transforming those values in unit-of-measurement agnostic features can ease up cross-market analysis.

Finally, among the many factors possible, we shortly mention and give some references for the speed and acceleration metrics and the flow metrics, which are computed by looking at the dynamics of the whole order book. Flows strongly correlate with the price and impact it. For example, adding or subtracting 1 conditional on the side on which a transaction takes place can reconstruct the price movement. Although that's just a correlation, it's a marvelous fact, unifying the volumes and prices by a simple counter. The idea is that there's some subtle dynamics between volume imbalance, market impact, trade sizes and volatility which can be inferred by creating metrics that are used as input data for algorithms, and the idea we should carry forward is that there's some activity at market microstructure level that contains information, which is exploited by trading algos and machine learning algos respectively.

What kind of Machine Learning does trading use?

The volumes of data available from exchanges are growing continuously, and an important feature which comes with the new frameworks for machine learning and big data is the ease of use of such big datasets. There's a wide range of libraries and packages available (and new ones continue to appear), providing lots of linear algebra algorithms, clustering, classification and regression algorithms. All come with a clear methodology to validate results, as well as powerful visualization features. Every trading strategy requires specific processing and specific knowledge, but all can use the help of recent machine learning advances, be it during the analysis phase or, sometimes, during the live trading phase.

An important feature of AI is the adaptability to changing conditions, and trading algos badly need both adaptability and speed. That's the reason why it is very common to use online learning in trading, adaptive algorithms that make predictions at every step, but also learn with every new sample they see (and a library which worth looking at is Vowpal Wabbit ). Even before machine learning and AI terms became very popular, trading algos were adaptive, changing fixed activation thresholds depending on market regimes (usually detected based on trends, spikes and volatility), or by computing dynamically thresholds based on the recent market activity. With online learning this gets to another level, the prediction model itself being updated at every step.