TSM - The Future of Continuous Integration

Patkós Csaba - Lead Software Developer @ Syneto

The Future of Continuous Integration

The first thing that came to my mind when I started writing down this SPEECH was: “Will I look back at it in five years with a smile or a frown?” It is very difficult to foresee the future in IT, it is almost impossible. There are a few geniuses like Gordon Moore who could foresee the evolution of CPU technology for 50 years. There is even a law named after him. But even his law will fail when the physics will catch up with us. It won’t be long, just a couple or so new generations of CPUs, and there we are. No more improvement allowed by physics on the same architecture.

On the other hand, other industries change at a rate of one significant change every 50 years. When was the last revolution in excavator technology? When was the last revolution in steel processing? When was the last revolution in road building? We are more or less using the same materials and techniques as 50 years ago. Yes, we can do all the things mentioned above faster, at a somewhat higher quality, and with fewer costs, but we only improved some really solid and tested processes.

Computers did not even exist 50 years ago. Well... there were some around. But let us say they were a toy for scientists rather, than machines of mass production. However, they existed. The first concepts of software development were put in place. The first paradigms of software development were defined.

In late 1950, Lisp was developed by MIT as the first functional programming language. It was the only programming paradigm that could be used at the time. All computers, few as they were, were programmed using functional programming.

 Twenty years later, structured programming started to gain traction by support from IBM. Languages like B, C, Pascal started to emerge. Let us consider this to be the first real revolution in software development. We started with functional programming, and then we got structural programming, something totally different. It was groundbreaking and it took about 20 years to emerge. While this seems a very long time now, it was what? Less than half the rate of industrial revolution … that tends to happen every 50 years or so.

The fast pace of evolution in software continued exponentially. It was about, or even less than, ten years later when Smalltalk was made public to the wide audience, in August 1981. Developed by Xerox PARC, it was the next big thing in computer science, Object Oriented Programming.

While some other paradigms came along in the upcoming years, these three were the only widely adopted ones.

But what about hardware? How far did we come in hardware evolution?

How many of you can remember the very moment when you interacted with a computer for the first time? Let your memory bring back that moment. Remember what you did, who you were with ... A friend? Maybe your parents? Maybe a salesman trying to convince your parent to buy a computer for you? Remember that very moment. Remember that computer. Remember the screen. How many colors did it have? Was it a green-on-black text console, a high-resolution CRT, or a FullHD widescreen? What about the keyboard? What about the mouse ... if invented at that time. What about the smell of the place? What about the sound of the machine?

Was it a magical moment? Was it stressful? Was it joyful?

I remember... It was about 30 years ago. My father took me to the local computer center, his workplace. Yes, he is a software developer, one of the first generations in my country (Romania). We played. It was a kind of Pong game, if I remember correctly. On a black background, two green lines lit up at each side of the screen.

It looked similar to this image, though this image has highly detailed graphics compared to the image of my memories, and it was running on something like this.

Well, it was neither this particular computer, nor even an IBM for that matter. It was a copy of capitalist technology developed as the proud product of the communist regime. It was a Romanian computer, a Felix.
The Felix was a very small computer compared to its predecessor, a real innovation. It could easily fit into a single large room, maybe 30-40 square meters. And it even had a terminal where you could see your code. Why was this considered a revolution in the field? It is a screen and a keyboard after all. Yes, but your code went directly on magnetic tape, and then, in just a couple of hours you could run your program, that is, if you made no typos.

Before the magnetic tape and console revolution, there were punch cards and printers. Programmers wrote their code on millimetric paper, usually in Fortran or other functional languages.

Then someone else, at a punch card station typed in all the code. Please note that the person transcribing your handwriting into computer language had little computer or software knowledge. It was a totally different job. Software developers used paper and pencil, not keyboard and mouse. They were not even allowed to approach the computer.
The result of this transcribing was a big stack of punch cards like this one.

Then, these cards were loaded into the mainframe, by a computer technician… the only person allowed to work directly with the computer itself.

Overnight, the mainframe, which was the size of a whole floor, requiring several dedicated power connection directly from the high-power power grid, processed all the information and printed the result on paper.
The next day, the programmer read the output and understood the result. If there was an error, a bug, or a typo, the whole stack had to be retyped because punch cards were sequential. If you were lucky, you could find a fix that affected only a small amount of cards. A fix would work with the same number of characters in the same memory region.

In other words, it took a day or even more to integrate the written software with the rest of the pieces and compile something useful. Magnetic tape reduced that to a few hours. Hard disks and more powerful processors in the '90s reduced that further to tens of minutes.

I remember when I installed my first Linux operating system. I had an Intel Celeron 2 processor. It was Slackware Linux. I had to compile its kernel at install time. It took the computer a few hours to finish. It was an entire operating system kernel. That was amazing. I could let it work in the evening and I had it compiled in the morning. Of course, I broke the whole process a few times, and it took me about 2 weeks to set it up. It seemed so fast back then...

I work at Syneto. Our software product is an operating system for enterprise storage devices. That means kernel, a set of user space tools, several programming languages, and our management software running on top of all these. We do not only have to integrate the pieces of the kernel to work together, but we have to integrate the C compiler, PHP, Python, a package manager, an installer, about two dozen CLI tools, about 100 system services, and all the management software into a single entity that works as a whole and which is more than the sum of its parts.

We can go from zero to hero in about an hour. That means to compile everything from source code up. We went from kernel to Midnight Commander, from Python to PHP. We even compile the C compiler we use to compile the rest of the stuff.

However, we do not have to do this most of the time. This is an absolute overkill and a waste of computing resources. We usually have most of the system already precompiled, and we recompile only the bits and pieces we recently changed.

When a software developer changes the code, it is saved on a server. Another server periodically checks the source code. When it detects that something has changed, it recompiles that little piece of application or module. Then, it saves its result to another computer which publishes this update. Then, another computer does an update so that the developer can see the result.
What is amazing in this schema is how little software development changed, and how much everything else around software developers changed. We eliminated the technicians typing in the handwritten code ... and we are now allowed to use a keyboard. Good for us! We eliminated the technician loading the punch cards into the server …. we just send it over the network. We eliminated the delivery guy going with the install kit to the customer …. we use the Internet. We eliminated the support guy installing the software ... we do automatic updates.

All these tools, networks, servers, computers, eliminated a lot of jobs except one, the software developer. Will we become obsolete in the future? Maybe, but I would not start looking for another carrier just yet. In fact, we will need to write even more software. Nowadays, everything uses software. Your car may very well have over 10 million lines of software in it. Software controls the World and the number of programmers doubles every 5 years. We are so many developers, producing so much code, that reliance on automated and ever more complex systems will be higher and higher.
Five years ago Continuous Delivery, or Continuous Deployment for that matter, was a myth, a dream. Fifteen years ago Continuous Integrations was a joke! We were doing Waterfall. Management was controlling the process. Why would you integrate continuously? You do that only once, at the end of the development cycle ... of course.

Agile Software Development changed our industry considerably. It communicated in a way that business could understand it. And most business embraced it, at least partially. What remained lagging behind were the tools and technical practices, and, in many ways, they are still light years away in maturity compared to organizational practices like Scrum, Lean, Sprints, etc.

TDD, refactoring, and others are barely getting noticed, far from mainstream, and they are even older than Agile! Continuous Integration and Continuous Delivery systems are, however, getting noticed. Their big advantage over software technologies is that business can relate to them. We, the programmers, can say: "Hey, you wanted us to do Scrum. You want us to deliver every sprint? You will need an automated system to do that. We need the tools to deliver you the business value you require from us at the end of each iteration."

The term Continuous Integration was first used and exemplified by Kent Beck and Ron Jeffries. While working on defining Extreme Programming, a new programming practice, they had to define some processes that would help sustain Extreme Programming.

If you are unfamiliar with Extreme Programming, it is an iterative and incremental development model. You have long iterations of features and shorter iterations for sets of stories. If you know Scrum, it is probably the practice that is closest to Extreme Programming.

In order to sustain short iterations and fast development, new techniques were required. Those were the days when Unit Testing took on, when Kent Beck invented TDD, and when Continuous Integration (CI) was defined.

Probably the most confusing part of CI is the word “continuous”. How do you define continuous? Does it mean instantaneous? If not, how long should the process of “integration” run to make it continuous?

Initially it was described as a process when each developer pushes the code he worked on at least once a day. Then, an automated system will compile and build a new version of the software project, containing the newly added code from each developer. Preferably, there will be tests and other code checks, in addition to what the compiler provides.

So, for them, continuous meant once a day. They did not mention how long the build and the check would take. It was irrelevant at that time. After all, all this happened in the late 90s, slightly more than 15 years ago.
Inventing Continuous Integration was an intriguing step because technical practices are hard to quantify economically, not immediately or tangibly, at least. Yeah, yeah ... We can argue about the quality of code, and legacy code, and technical debt. But they are just too abstract for most business to relate to them in any sensible manner.

Continuous integration (CI) is about taking the pieces of larger software, putting them together, testing them automatically, and making sure nothing breaks. In a sense, CI masks your technical practices under a business value. You need the CI server to run tests. You could very well write them first. You can do TDD and the business will understand it. The same goes for other techniques.

Continuous deployment means that, after your software is compiled, an update will be available on your servers. Then, the client's operating system (ie. Windows) will have a small pop-up saying that there are updates. Of course, this applies to any application, not just operating systems.

Continuous delivery means that, after the previous two processes are done, the solution is delivered directly to the client. Such an example would be the Gmail web page. Do you remember that the page sometimes says that Gmail was updated and you should refresh the page? Other examples are the applications on your mobile phone. They are updating automatically by default. One day you may have one version, next day you will have a new one, and so on, without any user intervention at all.

Business plus CI and CD? Oh man! They are gold! How many companies deliver software over the web as webpages? How many deliver software to mobile phone? The smartphone boom virtually opened the road ... the highway ... for continuous delivery!


Trends for "Smartphone"

Trends for "Continuous Delivery"

Trends for "Continuous Deployment"

It is fascinating to observe how the smartphone and CD trends tipped in 2011. The smartphone business embraced these technologies almost instantaneously. However, CI technology was unaffected by the rise of smartphones.

Trends for "Continuous Integration" 

So what tipped CI? There is no Google Trends data later than 2004. In my opinion, the gradual adoption of the Agile Practices tipped CI into this upward trend.

Trends for "Agile software development" 

The trends have the same rate of growth. They go hand-in-hand.

Continuous deployment and delivery will soon overtake CI. They are getting mature and they will continue to grow. Will CI have to catch up with them at some point? Probably yes.

Agile is rising. It starts to become mainstream. It is getting out of the early adopters category.

Follow the blue line in the Law of Diffusion graph above. Agile is in the early adopters stage, but it will soon rise into the majority section. When that happens, we will write even more software, faster, hopefully better. In terms of performance, we will need better CI servers, tools, and architectures. There are hard times ahead of us.

So, where do we go with CI from here?

Integration times went down drastically in the past 30 years: from 3 days, to 3 hours, to 30 minutes, to 3 minutes. Five years ago I worked on a project that resulted in a 100MB ISO image. From source to update, it took about 30 minutes. Today we have a 700MB ISO, and it takes 3 minutes. That is a 21x increase only in the past 5 years. I expect this trend to continue to rise in an exponential way.

In the next five years, build times will shrink. Smaller projects will achieve true continuity in integration. You will be able to see the changes you make to a project almost instantaneously. All your tests will run in a blink of an eye. The whole cycle described above will last 3-15 seconds.

At the same time, the complexity of the projects will rise. We will write more and more complex software. We will compile more and more source code. We will need to find ways to integrate these complex systems. I expect a hard time for the CI tools. They will need to find a balance between high configurability and ease of use. They must be seamless, simple enough to be used by everyone, and they must prompt interaction only when something goes wrong.

What about hardware? Processing power is starting to hit its limits. Parallel processing is rising and seems to be the only way to go. We cannot make processors faster any more. Moore’s law is hitting a brick wall. However, we can throw a bunch of CPUs into a single server. Therefore, processing may not be such a big issue. Yes, we will need to write new software. It will have to be optimized for massive multithreading. It will have to work with tens, or even hundreds of CPU cores. Our build machine at Syneto has 24 cores, and it is commodity hardware.

Another issue with hardware is how fast you can write all that data to the disks. OK, you processed it, now what? Fortunately for us, SSDs are starting to take over HDDs for everyday data storage. Archiving seems to be going to rotating disks for the next 5 years, but we are hitting the limits of the physical material there as well. And yes... digital data grows at an alarming rate. In 2013, the digital universe was 4.4 zettabytes. That is 4.4 billion terabytes! By 2020, it is estimated to be 10 times more, 44 zettabytes. Each person on the planet will generate on average 1.5 MB of data every second. Let us say we are 7 billion, which is 10.5 billion MB of new data every second, 630 billion MB every minute, 37800 billion MB every hour. In other words, this means 37.8 billion GB every hour. That is about 0.0378 zettabytes each day.

It is estimated that in 2020 alone we will produce another \~14 zettabytes of data. The trick with the growth of the digital universe is that it grows exponentially, not linearly. It is like an epidemic. It doubles at ever faster rates.

All that data will have to be managed by software you and I write, software that will have to be so good, so reliable, software that will perform so well, that all that data will be in perfect safety. To produce software like that, we will need tools like CI and CD architectures that are capable of managing enormous quantities of source code.

What about AI? There were some great strides in artificial intelligence lately. We went from basically nothing to a great Go player. Yet, that is still far from real intelligence. However, the first signs of AI applications in CI were prototyped recently. MIT released a prototype software analysis and repair AI in mid 2015. It actually found and fixed bugs in some pretty complex open source projects. Therefore, there is a chance that, by 2020, we will get at least some smart code AI analyses that will be able to find bugs in our software, bugs that we humans were unable to detect.

So, get ready for more complex systems, very fast hardware, maybe true continuous integration and delivery on small projects, some AI help around the corner… but do not start looking for another job just yet.