This is our second article on patterns for Microservices based systems. See also our previous article on (Micro)service Discovery using Netflix Eureka. This article introduces Hystrix, an open-source Java library from Netflix. Hystrix is a production-ready implementation of the Circuit Breaker - pattern for reducing the impact of failure and latency in distributed systems.
A particular characteristic of systems built using microservices is that they feature a great amount of distributed components. As the number of synchronous interactions over the network increases, the impact of one service dependency misbehaving can potentially become more severe.
The following are typical cases of abnormal service behaviour:
service is down
service call is taking too long
Without proper mechanisms in place, errors and, in particular, latencies will trickle up to the calling clients where they will potentially exhaust limited resources (e.g., web server thread pools). When cascading failures occur, the overall system availability is significantly affected: the entire system can grind to a halt from a single unhealthy dependency even if all other service dependencies are healthy.
A Circuit Breaker is used to wrap network operations which may fail. It monitors and detects when the downstream service dependency is misbehaving, and temporarily rejects calls to it until it becomes healthy again. By returning an exception immediately, it prevents resource exhaustion in the calling client process. At the same time, it reduces the load on the downstream service, thus increasing the chances for it to recover from the error condition it is experiencing.
In the following sections, we will describe the Hystrix implementation of the Circuit Breaker pattern.
Let's assume a client invokes a service. The client isolates points of access to the service by wrapping all network call invocations within a Hystrix Circuit Breaker (this is achieved at the code level via commands or annotations, more details below). The Circuit Breaker continuously intercepts and monitors all invocations and acts upon certain erroneous conditions.
When the service dependency is healthy and no issues are detected, the Circuit Breaker is in state closed. All invocations are passed through to the service.
The Circuit Breaker considers the following invocations as failed and factors them in when deciding whether to trip the circuit open:
An exception thrown (e.g., cannot connect, or service returns HTTP error 500)
The call takes longer than the configured timeout (by default 1 second)
The circuit opens as soon as Hystrix determines that the error threshold over a statistical time window has been reached (by default 50% errors over a 10 seconds window). In the open state, the Circuit Breaker will reject invocations by either:
Throwing an exception (also termed "fail fast", this is the default behavior)
To be able to recover from the error condition, when the Circuit Breaker is in the open state, it periodically leaves through one invocation at a configurable interval (by default 5 seconds) - this is the half-open state. If the invocation succeeds, the circuit is closed again.
We present two main options for using the library:
Directly using the Hystrix API - this requires wrapping each service call in Hystrix API commands.
To use the Hystrix library directly, one needs to add the dependency to the Maven project:
<dependency>
<groupId>com.netflix.hystrix</groupId>
<artifactId>hystrix-core</artifactId>
<version>1.3.20</version>
</dependency>
To wrap a service call into a Circuit Breaker, one needs to extend from the HystrixCommand
class. The example below uses a fictitious products service:
public class FindAllProductsCommand extends HystrixCommand> {
private RestTemplate restTemplate;
public FindAllProductsCommand(
RestTemplate restTemplate) {
super(HystrixCommandGroupKey.Factory
.asKey("ProductGroup"));
this.restTemplate = restTemplate;
}
@Override
protected List run() throws Exception {
// Apel serviciu HTTP
ResponseEntity responseEntity =
restTemplate.getForEntity(
"http://host/products", Product[].class);
Product[] products = responseEntity.getBody();
return Arrays.asList(products);
}
}
To invoke the command, construct it and call execute()
:
new FindAllProductsCommand(productService).execute();
To return a default result instead of throwing an exception when the Circuit Breaker is open, implement the getFallback()
method in the command.
public class FindAllProductsCommand extends HystrixCommand> {
...
@Override
protected List getFallback() {
return Collections.emptyList();
}
}
In the case when a certain exception is considered as expected behavior (e.g., business logic validation), and not a symptom of the service dependency misbehaving, it should be wrapped in a HystrixBadRequestException
.
public class FindAllProductsCommand extends HystrixCommand> {
...
@Override
protected List run() throws Exception {
try {
// Apel serviciu HTTP
...
} catch (IllegalArgumentException e) {
// Dacă se întoarce HystrixBadRequestException,
// Circuit Breaker-ul nu se va deschide
throw new HystrixBadRequestException(
"Bad request.", e);
}
}
}
To configure specific properties (timeouts, thread pool sizes, error thresholds, etc.), one can set them programmatically at the time the command is instantiated.
new FindAllProductsCommand(HystrixCommand.Setter.
withGroupKey(HystrixCommandGroupKey.Factory
.asKey("ProductGroup")).
andCommandPropertiesDefaults(
HystrixCommandProperties.Setter()
.withCircuitBreakerRequestVolumeThreshold(20)
.withCircuitBreakerErrorThresholdPercentage(50)
.withExecutionIsolationThreadTimeout-
InMilliseconds(1000)
.withMetricsRollingStatisticalWindow
InMilliseconds(10000)
.withMetricsRollingStatistical
WindowBuckets(10))
.andThreadPoolPropertiesDefaults(
HystrixThreadPoolProperties.Setter()
.withCoreSize(10)), restTemplate)
.execute();
Alternatively, one can use the Netflix Archaius configuration support.
We have already introduced the Spring Cloud library in our previous article. Spring Cloud is built on top of Spring Boot and provides abstractions for Netflix OSS technologies. Support for Hystrix is provided via the third-party library Javanica.
To use the Spring Cloud Netflix / Javanica support, one needs to add the following dependency to the Maven project:
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-hystrix</artifactId>
<version>1.0.0.RELEASE</version>
</dependency>
Additionally, one needs to add the EnableCircuitBreaker
annotation to the main Spring Boot application configuration class.
@EnableCircuitBreaker
public class HystrixClientDemoApp {
...
}
To wrap a service call into a Circuit Breaker, annotate the corresponding method with the HystrixCommand
annotation:
@HystrixCommand
public List findAllProducts() {
// Apel serviciu HTTP
ResponseEntity responseEntity = restTemplate.getForEntity("http://host/products", Product[].class);
Product[] products = responseEntity.getBody();
return Arrays.asList(products);
}
To return a default result instead of throwing an exception when the Circuit Breaker is open, add a reference to the fallback method in the annotation:
@HystrixCommand(fallbackMethod = "defaultProducts")
public List findAllProducts() {
// Apel serviciu HTTP
...
}
public List defaultProducts() {
return Collections.emptyList();
}
If a certain exception class should not be counted as an error by the Circuit Breaker, it should be listed in the annotation:
@HystrixCommand(ignoreExceptions = {IllegalArgumentException.class})
public List findAllProducts() {
// Apel serviciu HTTP
...
}
To configure specific properties (timeouts, thread pool sizes, error thresholds, etc.), one can use the standard Spring Boot application.yml configuration mechanism:
hystrix:
command:
findAllProducts:
execution:
isolation:
thread:
timeoutInMilliseconds: 1000
circuitBreaker:
requestVolumeThreshold: 20
errorThresholdPercentage: 50
metrics:
rollingStats:
timeInMilliseconds: 10000
numBuckets: 10
threadpool:
ProductService:
coreSize: 10
Hystrix provides out-of-the-box support for visualizing and monitoring the current state of the Circuit Breakers by streaming metrics data to a dashboard web application: Hystrix Dashboard. In a multiple server (cluster) scenario Hystrix is able to stream metrics to an intermediary aggregator: Turbine which sits in front of the dashboard.
The screenshots below show the Hystrix Dashboard in action:
The following metrics are shown and updated in real-time on the web dashboard:
Health and traffic volume
Request rate (at server and cluster levels)
Error percentage and counters (successes, rejected, thread timeouts, thread-pool rejections, failures/exceptions) for the current rolling time window (in the example screenshots above, for the last 20 seconds)
Circuit Breaker status
Latency percentiles for the last minute
For complete documentation on how to read the diagrams and counters, please refer to the documentation on the Hystrix Dashboard wiki.
Hystrix is a mature implementation of the Circuit Breaker pattern, with finely-tunable configuration and great visualization and monitoring support. The Spring Cloud Netflix / Javanica libraries offer an annotation-driven alternative to the direct Hystrix API which is less intrusive on the codebase.