Java Performance: Tools

Lucian Torje
Senior Java Developer @ Siemens

PROGRAMMING

The goal of this article is to provide an insight into the most used java performance tools. Like any tool, its usefulness depends on its usage and on the skills of the persons using it. We used the Spring Pet clinic application during the tests, since Spring MVC & Boot is the most used web framework according to ZeroTurnaround's Java Tools and Technologies Landscape 2016.

HPROF

HPROF is a .dll packaged free tool distributed with every JDK. It's built on top of JVM TI and provides heap and CPU profiling. According to Brendan Gregg it has a couple of reported issues:

It cannot be turned on/off.
CPU profiling overhead is too high (400x-1000x).
CPU sampling is inaccurate.

In order to profile a Spring Pet Clinic sample, the app needs to be enhanced with an exit endpoint that will close the application context. Profiling with HPROF is unpractical if the only specific areas are targeted, because it lacks the start/pause/stop capabilities.

$ mvn package
$ java -Xrunhprof:cpu=samples -jar target/spring-petclinic-1.5.1.jar
...
2017-03-20 23:00:49.565  INFO 11640 --- [  restartedMain] s.w.s.m.m.a.RequestMappingHandlerMapping : Mapped "{[/close]}" o nto public void org.springframework.samples.petclinic.vet.VetController.close()
...
Dumping CPU usage by sampling running threads ... done.
$ cat java.hprof.txt
JAVA PROFILE 1.0.1, created Mon Mar 20 23:00:22 2017
...
CPU SAMPLES BEGIN (total = 22941) Mon Mar 20 23:04:20 2017
rank   self  accum   count trace method
   1 56.37% 56.37%    5784 301590 sun.nio.ch.WindowsSelectorImpl$        SubSelector.poll0
   2 16.73% 73.10%    1716 301591 sun.nio.ch.ServerSocketChannelImpl.accept0
   3  6.67% 79.77%     684 300352 org.springframework.boot.loader.jar.JarURLConnection.getInputStream
   4  5.47% 85.23%     561 300356 sun.misc.URLClassPath$Loader.        getResource
   5  2.16% 87.40%     222 300231 java.io.RandomAccessFile.open0
   6  1.39% 88.79%     143 300131 java.lang.ClassLoader.defineClass1
   7  1.02% 89.81%     105 300502 sun.misc.URLClassPath$Loader.      findResource
...
CPU SAMPLES END

NetBeans profiler

NetBeans profiler is a free Java profiling tool integrated into the NetBeans IDE. It enables the following profiling tasks:

CPU profiling (higher overhead) /sampling (lower overhead)
Heap profiling
Threads and locks analyzing
SQL queries and basic JVM monitoring
Allows adding profiling points (places in code where either a snapshot is made, results are reset or the timestamp get returned)
Enables event filtering
Attach to a local or remote process

Overall, the NetBeans profiler is a nice and useful tool to use that has all the features needed for the job. It is a fine choice if you are looking for a good and free profiling tool.

VisualVM

VisualVM is a lightweight Java profiler. It's shipped with JDK and it can also be installed separately from the VisualVM download page. It is a better alternative to JConsole, offering the same capabilities. It is extendable through its plugins.

VisualVM includes the following features:

Displays local and remote processes (makes it easier to choose which process to profile)
Monitor and profile process performance and memory
Analyzes threads
Takes and analyzes offline thread/heap and profile snapshots dumps
Extendable though plugins, including MBeans (some of you may be familiar with from JConsole or from Java Mission Control)

VisualVM is started from terminal by running the following command (if in $PATH):

$ jvisualvm

VisualVM is definitely a tool that should not be left out of every Java developer's toolbox. This belief is also enforced by the ZeroTurnaround's report from Nov 2015 which states that 46% of the respondents do use VisualVM.

Mission Control

Java Mission Control and Flight Recorder (known as JRockit Mission Control and JRockit Flight Recorder - abbreviated JFR) are advertised as having near zero overhead profiling and diagnostics in production environments.

Java Mision Control is free to use, Java Flight Recorder is not. In order to use JFR you must agree with the Oracle commercial terms.

JMC & JFR offer all the basic features like CPU, heap, thread monitoring (CPU, deadlocks detection or threads count). From the advanced features, the ones worth mentioning are the triggers, I/O monitoring, MBeans, hot methods statistics, exceptions and events monitoring, as well as time period filtering.

Mission Control is accessible with the following command (its UI offers a nice Java Flight Recording wizard and also JMX access to JFR methods):

$ jmc

In order to prepare for recordings, the following command arguments need to be added to your app java process:

-XX:+UnlockCommercialFeatures -XX:+FlightRecorder

In order to schedule a flight recording add the following parameters:

-XX:StartFlightRecording=delay=20s,
  duration=60s,
  name=MyRecordings, 
  filename=c:/TEMP/myrecording.jfr,
  settings=profile 
-XX:FlightRecorderOptions=loglevel=info

Recordings can be started from the terminal using the following command:

$ jcmd 9200 JFR.start delay=20s duration=60s name=MyRecordings 
filename=c:/TEMP/myrecording.jfr,settings=profile
9200:
Recording 4 scheduled to start in 20 s. The result will be written to:
C:\temp\myrecording.jfr,settings=profile

Checking status from the terminal can be accomplished by running the following command:

$ jcmd 9200 JFR.check
9200:
Recording: recording=4 name="MyRecordings" duration=1m filename="c:/TEMP/myrecording.jfr,settings=profile" compress=false (running)

Making a JFR dump is easy as calling:

$ jcmd 9200 JFR.dump name=MyRecordings filename=c:/TEMP/dump.jfr
9200:
Dumped recording "MyRecordings", 265.2 kB written to: C:\temp\dump.jfr

Java Mission Control and Flight Recorder are definitely great tools to use and if you already own a license for "Oracle Java SE Advanced", "Oracle Java SE Advanced Desktop" or "Oracle Java SE Suite" there is no reason to switch to other profiling tools - JMC and JFR will do the job well.

JMH

JMH stands for Java Microbenchmark Harness and it is a micro benchmarking framework, distributed with OpenJDK since 2013. The reason why this framework is mentioned among profiler tool lies in the functionality that JMH offers, which is gathering performance statistics for a test object (from a piece of code to a full app). JMH should be configured with warmup time (yields better results) - this is known as the calibration step that some profilers perform before starting the actual measurement.

JMH benchmarking output methods run time (percentiles too), average time or throughput, depending on the configuration.

The easiest and recommended way to use JMH is to create a separate project using the following command:

$ mvn archetype:generate \
          -DinteractiveMode=false \
          -DarchetypeGroupId=org.openjdk.jmh \
          -DarchetypeArtifactId=jmh-java-benchmark-archetype \
          -DgroupId=org.sample \
          -DartifactId=test-pet-clinic \
          -Dversion=1.0

Now we have a project inside the test-pet-clinic folder, which we can build and start:

$ cd test-pet-clinic/
$ mvn clean install
$ java -jar target/benchmarks.jar
# JMH 1.9.3 (released 667 days ago, please consider updating!)
# VM invoker: C:\Program Files\Java\jre1.8.0_111\bin\java.exe
# VM options: <none>
# Warmup: 20 iterations, 1 s each
# Measurement: 20 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: com.test.spring.RestImplementationsBenchmark.owners
# Parameters: (path = /owners, port = 8080, server = localhost)

# Run progress: 0.00% complete, ETA 00:06:40
# Fork: 1 of 10
# Warmup Iteration   1: 7.829 ops/s
...
# Warmup Iteration  19: 91.628 ops/s
# Warmup Iteration  20: 93.504 ops/s
Iteration   1: 92.267 ops/s
Iteration   2: 34.738 ops/s
Iteration   3: 92.347 ops/s

Result " owners ":
  98.147 ▒(99.9%) 3.851 ops/s [Average]
  (min, avg, max) = (34.738, 98.147, 113.290), stdev = 15.129
  CI (99.9%): [94.296, 101.998] (assumes normal distribution)

# Run complete. Total time: 00:06:08

Benchmark  (path)  (port)   (server)   Mode  Cnt   Score   Error  Units
RestImpl   /owners  8080   localhost   thrpt 173  98.147 ▒ 3.851  ops/s

Among the JMH uses we mention TubeMogul's benchmarks for Java 8 or Pavel Samolysov's finding who concludes that: "EJB is up to 15% faster than the Spring Framework while CDI is up to 19% slower".

Daniel Mitterdofer identified in his elastic search benchmarking presentation 7 deadly sins, benchmarking sins need to be handled when performing benchmarks:

Sin #1: Not paying attention to system setup
Sin #2: No warmup
Sin #3: No bottleneck analysis
Sin #4: The divine benchmarking script - question everything (tools, APIs calls, measurement overhead, return codes)
Sin #5: Denying Statistics - distribution, t-test, multiple runs, percentiles
Sin #6: Vague Metrics
Sin #7: Treat Performance as One-Dimensional - handle interferences (software and hardware caching for e.g.)

XRebel

XRebel Trial comes as a .zip file containing the xrebel.jar java agent, the license and the license agreements. The integration is straightforward - just run your application on the server with the following parameters: -javaagent:[path/to/]xrebel.jar - this will enable the XRebel toolbar. Once the web application is started with XRebel enabled, the toolbar is displayed on bottom left side. The toolbar has the following menus:

Find exceptions - contains all logged exceptions
Access application profiling - time spent servicing the request
IO Queries - database queries regardless of the data access layer implementation - JDBC based or NoSQL
Logs - detects and scopes to sessions debug logs

It tracks down events from sources like: HTTP, Quartz , JMS, Periodic Task, RabbitMQ, Unidentified.

It has support for the following frameworks Spark, Spring MVC, Spring Boot, JSF, Vaadin, Spark Framework, Grails, Struts, Jersey and is capable of aggregating data from other XRebel enabled microservices used by the current frontend/backend application. There is also support for the following application servers GlassFish, JBoss, Jetty, Tomcat, TomEE, WebLogic, WebSphere, WebSphere Liberty Profile, WildFly. Among the NoSQL databases supported are Cassandra, Couchbase Server, MongoDB, Redis and among relational databases the following need to be mentioned: Apache Derby, H2, HSQLDB, Microsoft SQL Server, MySQL, Oracle, PostgreSQL, SAP MaxDB and SQLite.

JProfiler

JProfiler is a commercial product developed by EJ Technologies. It supports all the basic features:

Live profiling of local and remote sessions
Offline profiling and triggers
Memory analysis
CPU profiling including method statistics
Threads and monitor profiling
VM telemetry
Events logging

Some of the advanced features that we liked are:

Timeline bookmarking
Export report to html
Leaked database connections detection
RMI, web service and remote EJB calls logging
Database events reporting including JDBC and JPA/Hibernate queries
Snapshot comparison

The UI is intuitive and easy to use and, in a short amount of time, we were able to make use of its full power.

JMeter

JMeter provides load statistics from the user perspective (user feel), but also through its extension, it can be used to check internal methods. JMeter terminology defines the following items:

Test plan - this is a script that contains the steps performed by JMeter and is the starting point of a test (can contain one or many thread groups)
Thread group - this simulates a user's work; all other elements need to be added to it
Sampler - responsible for making the request - it could be a JUnit method run, java class calls (must implement org.apache.jmeter.protocol.java.sampler.JavaSamplerClient interface), a Groovy script call, a local process call or a network request (HTTP, TCP, JDBC, SMTP, RPC etc.)
Timer - adds delays
Pre and Post Processors - used to add pre and post processing before and after the request is made

The http requests can be added by the user or recorded by using the JMeter proxy server. Usually Firefox is used for the task and in case your app uses some specific headers like x-csrf-token used to detect forgery, it is possible, from JMeter, to extract and use tokens/ids from previous calls by creating referenceable variables (e.g. ${my_csrf}). JMeter has support for the following extractors:

RegexExtractor
XPath Extractor
CSS/JQuery Extractor
JSON Path Extractor (available via JMeter Plugins)

The measurement contains the following performance-relevant items:

Elapsed time - measured from just before sending the request to just after the last response has been received
Latency - includes all the processing needed to create the request and the first part of the response
Connect Time - time needed to establish the connection including SSL handshake

A typical JMeter http request looks like this:

Thread Name: Thread Group 1-1
Sample Start: 2017-03-26 13:33:01 EET
Load time: 1559
Connect Time: 1
Latency: 1559
…
Sample Count: 1
Error Count: 0
…

Response headers:
HTTP/1.1 200 OK
…
Set-Cookie: JSESSIONID=11D10081A56BB9FA911E7350E57429A1; Path=/owners
Content-Type: application/json;charset=utf-8
…

JMeter is a nice surprise and it's integration with the most important continuous integration tools (Bamboo JMeter Aggregator, JMeter plugin for TeamCity, CircleCI JMeter package or JMeter Jenkins Plugin) makes it a profiler to be considered.

YourKit

YourKit profiler is a standard profiler offering the basics:

CPU monitoring
Threads and deadlocks analyzing
Memory information
Exceptions, database or filesystem events count

Out of the advanced features, YourKit offers:

Export reports to CSV, HTML, TXT, ZIP
SQL queries

Overall the interaction was a pleasant one. In order to get the high-level statistics, the app needs to work with CPU tracing or CPU sampling.

Grades

Note: The grades should be taken lightly - they represent our own user-driven perspective. Each grade should be taken in a [0.5] convergence interval. The final grade was calculated as an average of grades (where Yes/No correspond to 0/5).

Conclusion

Table 1 - Java profilers evaluation

Knowing as much as possible about the JVM and your app will help understand the profiler's results. One of the best sources to read about the hotspot lifecycle and optimizations performed is Doug Hawkins's JVM Mechanics presentation. For those curious enough, there are some nice tools that allow seeing the Java byte code and also the natively generated assembly code (javap & JITWatch with hsdis enabled).

GC's hiccups could also be a source of downtime (although the blame is put on GC most of the time). If the GC is the culprit, there is always the option of switching to a different implementation - there are several available for the Oracle JVM and also for Open JDK. The most notable is Shenandoah. GC tuning is explained here.

Instrumenting apps alter the measurements, sometimes by an order of magnitude. This happens due to the fact that the profiled code is modified while performing instrumentation, thus making the JVM behave differently (different optimized areas, different safepoints, different code layout and added overhead). This is not an isolated case since most profilers make use of Java Virtual Machine Tool Interface, the only difference lies, in most of the cases, in the way each profiler performs calibration.

Sampling works by observing the system from outside on a regular basis, is usually done by creating a thread responsible for making thread dumps and analyzing the results. This approach uses the CPU; therefore, the data will be biased. Smart profilers like Richard Warburton's Honest Profiler use parts from both worlds - checks the calling times regularly by calling an OpenJDK internal API that does async call trace.

Measuring the Java app performance from outside can also be done by using the logs, JMX monitoring, application actuators or simply by checking the system performance counters.

There is another great source related to performance methodologies and tools for CPU issues in Aleksey Shipilëv's performance mind map.