TSM - Interview with Peter Lawrey high performance systems in Java

Attila-Mihaly Balazs - Software Panther @ Synapp.io

Hello, everybody! This is Attila for Today Soft Mag and today I have with me Peter Lawrey. Peter thank you for agreeing to this interview and we will talk about Java performance. Could you please introduce yourself?

[Peter] Yes, my name is Peter Lawrey, I am a Java consultant in the low latency space. I have a popular blog called "Vanilla Java" which gets about 120 000 hits / month; I also have a library called Chronicle which is for low latency persistence IPC and data store and also I"m third on StackOverflow for Java.

When people think about low latency / high performance, they usually say things like "C++ is better", they phrase things in terms of the programming language. How important do you think is the programming language and what other factors influence the latency or the throughput of the system - besides the programming language.

Peter Lawrey

[Peter] What I found is that different programming languages can attract different development styles and developments and, in particular, if you"ve got a low level language like C++ or C and you don"t have a pretty good understanding of exactly what"s going on, you"ll shoot yourself at the foot pretty quickly and you"ll learn fast. You have to. Whereas at Java, a lot of Java developers are deliberately protected from needing to know all these details and so therefore often they don"t, which is generally a good thing except when you want to be able to program in low latency, you need to have a much better understanding of what is the code really doing, and so it"s not that Java you can"t do it, it"s more that there aren"t as many Java developers that have the skills set and in C++ you"re kind of forced to have that skills set - you won"t survive if you don"t. The libraries can make a difference as well, but the biggest advantage for Java is the fact that so many of the common libraries are likely to need are built in, whereas in C++ you have to go to third party libraries to do a lot of the same things.

So, do you find that the general opinion that C++ is faster or compiled languages are faster than Java holds up or is this just a myth?

[Peter] In theory, C++ is always faster, in theory. The thing is that in practice you have limited resources, you have limited time, you have limited expertise and you have changing requirements and in that sort of environment what can happen is that you don"t have enough time to micro- optimize every little bit, (…) you can"t fine tune everything because data becomes unmaintainable. So, given let"s say a week or a month, a developer who"s reasonably equally skilled in both languages will produce much the same performance. It"s just that if you give that same developer a year instead to do the same task, that will be faster in C++ and when you see the libraries that are particularly faster in C++ and C are well understood problems that don"t change very much and have been really tuned to death, things like video processing, matrix operations, the sort of operations that have been around for a very long time involve huge pulse, simple code repeated many times, and in those situations you find that C is faster. But in most business related applications, the amount of code fine tuning you can do on each line isn"t so important as the whole performance of your application and business logic tends to change over time and then, what you tend to think about is maintainability and once that starts coming in, and robustness and you face lots of changes then you have to back off in C++ anyway. You end up building a lot of the protections that Java gives you already, like you end up having some sort of message brokering system to isolate the components of your system, to protect from crashes, for example, all those sorts of things Java will do for you efficiently. So there"s no guarantee that the C++ systems would be faster. In particular I worked in one place where all of the client facing application was written in C++ and that was the less latency sensitive, whereas the hedging of all of the funds, the latency sensitive code is written in Java because we had much faster time to market and in fact all of the fast delivery, a lot of the new changes were going into the Java system, simply because the C++ had reached the point where it was very difficult to maintain. And they would print for example daily crash reports. Instead of throwing exceptions the whole program would crash and restart, all those sorts of things that would be considered unacceptable in Java.

Interesting. So what advice would you give somebody who is a Java programmer and is just starting to get interested in the performance of their system? Maybe they are interested themselves or maybe there is an external reason, like the client is saying that the system is too slow. What would be the first steps they should take?

[Peter] The most useful thing is visibility. What is going on in your system, where is the time being spent, using things like profilers as a first part to look across your whole application. But then, once you feel you have an understanding of what your application is doing, putting time stamps in and recording those, either logging them or using something like my Chronicle to record them in the low latency way, then that gives you visibility as to what the application is doing, and it allows you to see where is my time being lost, what is it doing and then, the simple problems you can solve, you break those down and make them faster and faster. The way I got into the low latency space is that I initially started with a company which wasn"t particularly low latency, but I had this aim of making the system ten times faster than it needed to be, and then for the next client, the next people I worked for, I did the same thing, and again and again. And eventually I knew from hundreds of milliseconds down into the hundreds of microseconds and eventually sub-hundred microsecond range. So, you can do it in a stepwise fashion, but I generally find that if you make a system that can handle ten times the volume that is actually asked for, it"s usually very stable as well. So, there are benefits to doing this; you"re not just doing the absolute minimum all the time.

Good. So Peter, what is your favorite thing about Java? Either the language or the platform?

[Peter] The favorite thing about Java is just that it has so much of what you need built in and the tool set is very good and mature. There"s a lot of cooler new languages out there, but they don"t have good profilers and debuggers and a lot of the tools set around there to help you write the code and code analysis and it"s really the tool set that really brings Java to life, so to speak, rather than just the language itself. Part of the reason why there is such a good tool set is that Java is such a poor-feature language, in the sense that it"s very economical in terms of what features it adds, but this means that every feature is well understood, the interactions between features is well understood, they"re relatively easy for an application to reason about in terms of checking and profiling and analyzing and looking at side effects and so on. So, because it is a relatively simple language at its core, the tool set is generally very good and it can save you a lot of work, so a lot of the complaints people have about Java, things like "it"s very verbose", a lot of those can be taken care of, using good tool sets and certainly I"d recommend anyone who"s not using an IDE with a profiler, they really should be. And learn to use the debugger when you"re trying to debug your program and it"ll save you a lot of trouble.

Great. Java 8 is coming up and it should be launched this autumn, I think. Are there any particular features of Java 8 which you are excited about?

[Peter] I think the most exciting thing about it is that there are a lot of features being added, not huge ones in themselves, but a lot of improvements. Closures is the one that gets the most press, although technically it"s really just catching up with C#. They couldn"t agree on what specifications they should have for closures, so they eventually settled on what C# does. They added virtual extensions, which internally are called something else, I can"t remember, but why did they call it "virtual extensions"? - Well, that"s what C# calls them. So, in a way, it"s really just catching up with what other languages are doing. But a lot of the other features - there"re 66 improvements, of which 3 relate to closures - a lot of them are smaller improvements but you see quite a lot of developments in the evolution in the JVM, things such as JodaTime"s DateTime being brought into the language - probably one of the most popular add-ons till now is to be able to use a proper DateTime and things like for me at the lower level it"s things like having a proper discreet memory barriers. So Unsafe has a force load / force store memory barrier that"s explicit. In the past, you sort of had to do it indirectly by using other operations that also had these features, whereas now you can deal with them explicitly. But that"s a very low level feature.

Are there any features which you would like to see in Java and are not included in Java 8?

[Peter] Probably the biggest thing that I would like to see improved is because I use Unsafe quite a lot, is not add the feature but actually make it part of the specification. There"s a lot of functionality in Unsafe that"s used in Chronicle and Disruptor and other libraries which are outside the specifications, so they"re there in OpenJDK and HotSpot and other compatible JVMs such as JRocket or Azul Zing, but they"re not standards. So, they wouldn"t need to do anything in terms of adding functionality, they would just need to make it that this is a standard feature of all Java platforms. In that way you"ve got then a standard way of dealing with low level memory access. And in a threat safe manner.

Are there any resources like books, blogs, videos, training courses that you would recommend to Java programmers who are interested in the domain of performance? There is of course your blog, which you mentioned in the beginning, and we will link to that, but what other resources would you recommend?

[Peter] I would actually recommend looking at the Performance Java User"s Group because that"s where I put all of what I considered the most interesting video posts on the subject and I hope to encourage other people to post there as well. So, I think that"s probably the best place to start. There are two other blogs. Sorry, they aren"t really blogs, they"re forums, e-mail forums that are worth looking at, and that"s "The Mechanical Sympathy" which is led by Martin Thompson and he was the CTO at L-MAX when Distruptor was developed, and that"s very low level though. Even most people who are interested in Java performance wouldn"t have use for about 99% of it but it"s a very interesting discussion, all the same. There"s also a forum called "Friends" at jClarity which is led by a couple of guys who developed the "Well Grounded Java Developer" for Java 7 and that"s Benjamin Evans and Martijn Verburg. They very much focus on GC and related issues, but they"re very practical based and offer good advice and they also provide consulting if you"re interested. They have a couple of products as well that are in that space but that forum is very interesting if you"re looking for tuning your GC.

Great. We will include links to all these in the comment section and annotation for this video. Is there anything else you would like to talk about, related to this?

[Peter] Yes, I"ve got a new version of the Chronicle coming up, which is added to a new organization called on GitHub called OpenHFT that"s set of open high-frequency trading based libraries. And Chronicle itself is being split into sections: memory, memory manipulation and deserialization, separate from its logging, so you don"t have to use logging to disk to use its features and it"s also been made more performant, for example, on this laptop I can get 80 million messages a second passed from one thread to another, and that"s with every message being persisted. Also it will have a support for rolling logs which you can do yourself in Chronicle 1, but a lot of people would have liked the library to do it for you and so, that"s gonna be a feature added in Chronicle 2. There"s also another library being started up which will look at storing huge amounts of data off-heap. In particular into memory map files, so it will provide similar features to what Terracotta"s BigData does but instead it"s only limited by the size of your disk space rather than your main memory, so you can have much bigger capacities and also the data will be stored much more efficiently, so you can get data in and out faster and use less space, as well. And it will be open source. And the other thing that"s coming up is a fix engine that will be based around the Chronicle as well, so you"ll be able to have low latency parsing and writing of fix messages. It will be loosely based on what QuickFix except that it"s designed to be much more efficient.