Runtime Performance in the Age of Horizontal Scaling

Cotyso Bodea
Technical Director @ TORA

PROGRAMMING

We live in exponential times, as one of those catchy videos circulating on YouTube and Facebook was proudly boasting. And no area of our lives abides to this motto as much and as clearly as the evolution of the processing power available at our fingertips. Moore's law - and those who toil incessantly to actually keep it relevant - is partially to blame for this, but the real cause is our endless pursuit of progress, our fascination with more, bigger, faster things, all these adjectives we usually associate with better. In this monotonically increasing evolution though, a problem arises when we hit an unsurpassable wall, when we need to decide for instance whether better means faster or more.

And this is precisely where the world of data processing performance is at the moment. When talking about performance one usually means either latency or throughput, or both. These metrics are sometimes competing, and various techniques optimize one or the other. Usually latency is improved by focusing on linear or quasi-linear performance (i.e. single-threaded or single-process performance) - as opposed to horizontal scaling, which is associated with high throughput.

So the big question is whether faster can be achieved by more - more specifically, can overall performance be improved by deploying more resources to handle a problem, instead of making single resources faster as it was done traditionally? This article will argue that, despite the current trend towards scaling, there's no definitive answer, and the answers depend on each particular problem to be solved - and that there are surprisingly many areas where scaling does not help, leaving us in a place where both linear performance and horizontal scaling have to coexist and work together.

How did we get here?

There was a time in the not-so-long history of computers when, if you wanted your program to run faster, you had to either improve the program itself or to run it on a faster CPU (and this usually meant higher operating frequency). That is not the case anymore, and hasn't been for a while, and a new paradigm based on parallelization and scaling emerged. Read on for a very brief overview of the major turning points.

The physics were friendly to the computer designers for a long time, but, as everybody knows, that free lunch has been over for quite a while now. The barrier erected by Mother Nature somewhere around 4 GHz proved way too costly to surpass in the mass market scenario, leading to a frantic search in the industry for other ways of pushing the bounds of the processing power. While a few alternatives were explored (among them would be the increase in frequency no matter the costs, or switching to quantum computers), the solution which prevailed was the massive refocusing on parallelization and on parallel (aka horizontal) scaling. The result - we have now mainstream CPUs with up to 20 cores, and specialized CPUs with thousands of cores. This parallelism was supposed to be exploited by massively multi-threaded applications - but experience shows this didn't really happen. For various reasons, mostly related to application architecture, most applications still have one or a handful of main threads so usually the gains of running on CPUs with lots of cores are limited (but real nonetheless).

In parallel, a similar shift - with arguably higher amplitude - happened in the software industry, leading to the rise of distributed computing. While distributed processing has been known and used for a long time, with packages such as PVM boasting a proven track record in academia and in the supercomputer world, it was the rise of Internet companies (and mainly Google) who really pioneered large-scale horizontal scaling on commodity hardware. It is truly difficult to overstate the magnitude of the revolution this triggered. Previous to Google's efforts in the area, pretty much every organization's solution for handling large amounts of data was deploying insanely expensive mainframe computers. The key advantage such computers presented was that they would allow regular software to handle large (by that day's standards) amounts of data in reasonable times.

In contrast, the new paradigm allowed for a far lower entry point for hardware - but at the price of modifying the way software was designed. The popularization of the distributed agreement protocols (Paxos, two-way commit, etc.) allowed for distributed transactional systems, thus providing a middle road (and an easier pill to swallow) between the old school transactional ACID world and the revolutionary eventually consistent world as popularized first by Amazon.

Quite predictably for a paradigm change of this magnitude, it took a while for it to gain wide acceptance - but it is certainly accepted now. So much that it can be argued we're seeing an over-acceptance - or, in simpler words, we're seeing this paradigm replacing the old one even where it's not actually the best fit.

And the last factor contributing to the current situation is the ever accelerating business landscape - especially in the online world. Time-to-market is king, good developers are becoming a scarce resource worldwide, so costs associated with development and with opportunity risks are constantly growing.

Given the above, it's easy to get the idea that the parallel, horizontally-scaled computing is the next generation, and the old way of doing things based on higher linear performance is something of the past. In other words, that performance should be achieved not by trying to squeeze the last bit of speed from the resources but rather by deploying more resources - since it's easy and cheap. And, to some extent, that idea is certainly true. What is not true though is that anything and everything can be solved by blindly throwing more resources to it.

So where is linear performance still important?

Turns out, there are still surprisingly many areas where performance can only be improved by making the programs themselves more efficient.

First reason for this, and the most obvious: not everything is parallelizable. Like the old adage maintaining that nine mothers cannot deliver a baby in a month even if working together, there are still problems which have to be solved in a sequential manner. There are theoretical limits to the degree to which a program can be parallelized, and there's a large body of research in this area stemming from Amdahl's law, so no need to go into further detail. Second, even for problems which can be solved in a parallel manner, there is the not-so-small issue of efficiency - inefficient computation can lead to unwanted effects such as high operating costs or high impact on battery life. And third, there are many environments where the computing power is not plentiful (think mobile, embedded, etc.).

Perhaps even more surprising is not the actual number of areas where scaling doesn't work, but rather the fact that this number is actually increasing. Over the next paragraphs I'll go briefly over a few of these areas.

I will start with a sector very familiar to me, namely the financial electronic markets. Parallelism works to a certain degree, and it's definitely put to good use wherever possible, but it can only get you so far. The key in an electronic market is achieving the lowest response time possible, and scaling horizontally will only benefit some of the steps of the read-analyze-decide-act-write pipeline. Players in these markets go to unbelievable lengths in their quest for lower latency (implementations of algorithms in hardware, privately laid undersea cables, microwave- and laser-based wireless communications, etc.), so any kind of inefficiency in the software would be extremely costly.

Besides the financial world, electronic exchanges do exist - and are actually on the increase - in other sectors as well. Maybe some of the readers were unlucky enough to lose a bidding war on Ebay to some pro with an automatic bidder, or have placed a bet on one of the online betting sites - quite a few of which offer bet exchanges complete with APIs allowing the development of automatic betting algos. And, maybe less visible to the general public, electronic exchanges are becoming the backbone of industries such as advertising (the ads you see while browsing are auctioned in real time while the page is loading) or logistics (there are electronic exchanges where transporters bid for cargo and vice versa). All these have the same requirements as the financial market, with low latency (and not high throughput) the name of the game - so scaling cannot substitute for lousy software.

Then, there are the huge elephants in the room named big data and machine learning. On one hand, these are the poster children of horizontal scaling. On the other hand however, the costs associated with processing large amounts of data can grow quite quickly if performance is left only to the hardware and scaling. There are various definitions to what constitutes big data - but a very good one is that if you don't care about the costs and / or time needed for processing your data you don't have big data.

Mobile development continues to be extremely big - and given the relatively slow CPUs (as compared to mainstream CPUs) and battery life restrictions this is a domain where efficiency and performance is critical. Last thing a developer wants is to find out the battery watchdog inside the mobile OS has shut down the app due to high battery consumption - and it also alerted the user about it.

Other extremely important up-and-coming areas where scaling won't help are the new embedded platforms - IoT and wearables. Linear performance here is critical, as the chipsets are extremely simple (thus slow) and battery life is critical. Some calculations can be moved to the cloud, true, but this cannot be done in all cases. If your smart electrical socket cannot perform its duty due to the Internet being down it will look anything but smart. And if said socket requires more energy than the lightbulb connected to it for the only benefit of measuring the bulb's energy needs, that wouldn't be seen as a huge success either.

Same thinking of course still applies to the classical embedded software - huge amounts of devices, ranging all the way from autonomous cars to industrial machines.

Old domains, where linear performance is still as relevant as ever, are game development and anti-malware development - for obvious reasons. And they are definitely not going away - they are actually expanding from their traditional stronghold on the desktop to the new lower-power platforms (mobile, IoT, wearables), where the need for performance will be even more stringent.

In addition, there are still huge amounts of old enterprise software, which would take huge efforts to be rewritten or refactored to a scalable or eventually consistent architecture, so it will never happen - but the customers are still demanding more and more performance.

And to close, a personal nuisance: the web apps. Modern websites and web-based apps (think Evernote, PopcornTime (ahem), or the long list of Electron-based apps) are quite heavy on the client side, either in a browser or in a browser-based sandbox, so they cannot benefit from any kind of scaling. The reason this is a personal nuisance is that most of these apps are incredibly inefficient, and quite frequently the developers' strategy here is to simply ignore the performance - we've done such a good work as an industry to lower users' expectations that they simply accept lousy performance as normal.

When not to care about performance

All that said, while performance is always beneficial, focusing on performance is not always needed.

First reason for this is that the compilers, interpreters, virtual machines, CPUs and other hardware - all these work as hard as possible to make things fast, even if the programmer didn't go out of his way to optimize for runtime performance. This makes performance reasonable in a significant number of cases, even without specific optimizations.

Second, as mentioned previously, there are cases where other factors (mainly time to market or development cost) outweigh the need of performance. A perfect system reaching the market after the window of opportunity has closed is completely useless and way worse than an imperfect system deployed while still relevant. And third, some programs do have pretty lax performance requirements - there's absolutely no reason and would be a waste of time and money for somebody to optimize a web site to respond in under 1 ms when the human reaction time is in the hundreds of milliseconds. And even in cases where performance is needed, the optimizations can usually be delayed. There's the saying that premature optimization is the root of all evil - and it does have substance to it.

Learning when to focus and when not to focus on linear performance is important. Somebody who writes a generic website in C with lock-free programming and object reuse and other high performance tricks will come out just as amateurish as somebody trying to implement a low latency trading system in Prolog - both require advanced knowledge, but misusing the knowledge is a waste.

Closing thoughts

It is very hard to measure how much of the software being created needs linear performance optimizations, and how much can rely on horizontal scaling - and, for that matter, how much doesn't need either of these. All these categories coexist and will continue to coexist for the foreseeable future - and as shown, the sectors where linear performance is important are not only not going away, but they are actually thriving.

Given that, it is more relevant than ever for a competent programmer to be able to write fast and efficient code when needed. It is a first-level skill for anybody who strives to be seen as a senior developer who can move between different types of projects. Lacking this skill is akin to lacking competencies in proper software design and architecture. Like design patterns for instance, these skills need to be in a programmer's toolbox so they can be used on the appropriate occasions.

True, a lot of developers will not need these skills, just like a lot of developers won't need software architecture skills - since the architecture will be laid out by senior developers for them. For developers who, for instance, will spend their whole careers writing simple websites (a perfectly valid career option these days) learning about performance would be a waste of time. Incidentally, that's also true for learning how to build software which will scale horizontally.

But the versatile senior programmers of today still need these skills as much as ever, and the software world will continue to need these skills until we'll find ourselves put out of jobs by the raise of AIs. But that's a completely different matter.