EDITING BOARD
RO
EN
×
▼ BROWSE ISSUES ▼
Issue 36

Fault-Tolerant Microservices with Netflix Hystrix

Radu Butnaru
Senior Developer @ SDL
PROGRAMMING

This is our second article on patterns for Microservices based systems. See also our previous article on (Micro)service Discovery using Netflix Eureka. This article introduces Hystrix, an open-source Java library from Netflix. Hystrix is a production-ready implementation of the Circuit Breaker - pattern for reducing the impact of failure and latency in distributed systems.

Problem

A particular characteristic of systems built using microservices is that they feature a great amount of distributed components. As the number of synchronous interactions over the network increases, the impact of one service dependency misbehaving can potentially become more severe.

The following are typical cases of abnormal service behaviour:

Without proper mechanisms in place, errors and, in particular, latencies will trickle up to the calling clients where they will potentially exhaust limited resources (e.g., web server thread pools). When cascading failures occur, the overall system availability is significantly affected: the entire system can grind to a halt from a single unhealthy dependency even if all other service dependencies are healthy.

Solution

A Circuit Breaker is used to wrap network operations which may fail. It monitors and detects when the downstream service dependency is misbehaving, and temporarily rejects calls to it until it becomes healthy again. By returning an exception immediately, it prevents resource exhaustion in the calling client process. At the same time, it reduces the load on the downstream service, thus increasing the chances for it to recover from the error condition it is experiencing.

In the following sections, we will describe the Hystrix implementation of the Circuit Breaker pattern.

Hystrix Circuit Breaker Overview

Let's assume a client invokes a service. The client isolates points of access to the service by wrapping all network call invocations within a Hystrix Circuit Breaker (this is achieved at the code level via commands or annotations, more details below). The Circuit Breaker continuously intercepts and monitors all invocations and acts upon certain erroneous conditions.

Closed State

When the service dependency is healthy and no issues are detected, the Circuit Breaker is in state closed. All invocations are passed through to the service.

Open State

The Circuit Breaker considers the following invocations as failed and factors them in when deciding whether to trip the circuit open:

The circuit opens as soon as Hystrix determines that the error threshold over a statistical time window has been reached (by default 50% errors over a 10 seconds window). In the open state, the Circuit Breaker will reject invocations by either:

Half-Open State

To be able to recover from the error condition, when the Circuit Breaker is in the open state, it periodically leaves through one invocation at a configurable interval (by default 5 seconds) - this is the half-open state. If the invocation succeeds, the circuit is closed again.

Usage

We present two main options for using the library:

Direct Hystrix API

To use the Hystrix library directly, one needs to add the dependency to the Maven project:

<dependency>
    <groupId>com.netflix.hystrix</groupId>
    <artifactId>hystrix-core</artifactId>
    <version>1.3.20</version>
</dependency>

To wrap a service call into a Circuit Breaker, one needs to extend from the HystrixCommand class. The example below uses a fictitious products service:

public class FindAllProductsCommand extends HystrixCommand> {

    private RestTemplate restTemplate;

    public FindAllProductsCommand(
           RestTemplate restTemplate) {

      super(HystrixCommandGroupKey.Factory
        .asKey("ProductGroup"));

      this.restTemplate = restTemplate;
    }

    @Override
    protected List run() throws Exception {
        // Apel serviciu HTTP
      ResponseEntity responseEntity = 
          restTemplate.getForEntity(
          "http://host/products", Product[].class);

      Product[] products = responseEntity.getBody();
      return Arrays.asList(products);
    } 
}

To invoke the command, construct it and call execute():

new FindAllProductsCommand(productService).execute();

To return a default result instead of throwing an exception when the Circuit Breaker is open, implement the getFallback() method in the command.

 public class FindAllProductsCommand extends HystrixCommand> {
     ...
     @Override
     protected List getFallback() {
         return Collections.emptyList();
     }
 }

In the case when a certain exception is considered as expected behavior (e.g., business logic validation), and not a symptom of the service dependency misbehaving, it should be wrapped in a HystrixBadRequestException.

public class FindAllProductsCommand extends HystrixCommand> {
...
    @Override
    protected List run() throws Exception {
        try {
            // Apel serviciu HTTP
            ...
        } catch (IllegalArgumentException e) {
    // Dacă se întoarce HystrixBadRequestException, 
    // Circuit Breaker-ul nu se va deschide

       throw new HystrixBadRequestException(
         "Bad request.", e);
        }
     }
 }

To configure specific properties (timeouts, thread pool sizes, error thresholds, etc.), one can set them programmatically at the time the command is instantiated.

new FindAllProductsCommand(HystrixCommand.Setter.
    withGroupKey(HystrixCommandGroupKey.Factory
     .asKey("ProductGroup")).
    andCommandPropertiesDefaults(
     HystrixCommandProperties.Setter()
     .withCircuitBreakerRequestVolumeThreshold(20)
     .withCircuitBreakerErrorThresholdPercentage(50)
     .withExecutionIsolationThreadTimeout-
        InMilliseconds(1000)
     .withMetricsRollingStatisticalWindow
        InMilliseconds(10000)
     .withMetricsRollingStatistical
        WindowBuckets(10))
     .andThreadPoolPropertiesDefaults(
        HystrixThreadPoolProperties.Setter()
     .withCoreSize(10)), restTemplate)
     .execute();

Alternatively, one can use the Netflix Archaius configuration support.

Spring Cloud Netflix / Javanica

We have already introduced the Spring Cloud library in our previous article. Spring Cloud is built on top of Spring Boot and provides abstractions for Netflix OSS technologies. Support for Hystrix is provided via the third-party library Javanica.

To use the Spring Cloud Netflix / Javanica support, one needs to add the following dependency to the Maven project:

<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-starter-hystrix</artifactId>
    <version>1.0.0.RELEASE</version>
</dependency>

Additionally, one needs to add the EnableCircuitBreaker annotation to the main Spring Boot application configuration class.

@EnableCircuitBreaker
public class HystrixClientDemoApp {
...
}

To wrap a service call into a Circuit Breaker, annotate the corresponding method with the HystrixCommand annotation:

@HystrixCommand
public List findAllProducts() {
     // Apel serviciu HTTP
    ResponseEntity responseEntity = restTemplate.getForEntity("http://host/products", Product[].class);
    Product[] products = responseEntity.getBody();
    return Arrays.asList(products);
}

To return a default result instead of throwing an exception when the Circuit Breaker is open, add a reference to the fallback method in the annotation:

@HystrixCommand(fallbackMethod = "defaultProducts")
public List findAllProducts() {
     // Apel serviciu HTTP
     ...
}

public List defaultProducts() {
     return Collections.emptyList();
} 

If a certain exception class should not be counted as an error by the Circuit Breaker, it should be listed in the annotation:

@HystrixCommand(ignoreExceptions = {IllegalArgumentException.class})
public List findAllProducts() {
     // Apel serviciu HTTP
     ...
}

To configure specific properties (timeouts, thread pool sizes, error thresholds, etc.), one can use the standard Spring Boot application.yml configuration mechanism:

hystrix:
    command:
        findAllProducts:
            execution:
                isolation:
                    thread:
                        timeoutInMilliseconds: 1000
            circuitBreaker:
                requestVolumeThreshold: 20
                errorThresholdPercentage: 50
            metrics:
                rollingStats:
                    timeInMilliseconds: 10000
                    numBuckets: 10
    threadpool:
        ProductService:
            coreSize: 10

Monitoring with Hystrix Dashboard / Turbine

Hystrix provides out-of-the-box support for visualizing and monitoring the current state of the Circuit Breakers by streaming metrics data to a dashboard web application: Hystrix Dashboard. In a multiple server (cluster) scenario Hystrix is able to stream metrics to an intermediary aggregator: Turbine which sits in front of the dashboard.

The screenshots below show the Hystrix Dashboard in action:

Circuit Breaker Closed

Circuit Breaker Open

The following metrics are shown and updated in real-time on the web dashboard:

For complete documentation on how to read the diagrams and counters, please refer to the documentation on the Hystrix Dashboard wiki.

Conclusion

Hystrix is a mature implementation of the Circuit Breaker pattern, with finely-tunable configuration and great visualization and monitoring support. The Spring Cloud Netflix / Javanica libraries offer an annotation-driven alternative to the direct Hystrix API which is less intrusive on the codebase.

References

  1. Pattern-ul Circuit Breaker - by Martin Fowler

  2. Hystrix project

  3. Hystrix Wiki

  4. Spring Cloud Netflix

  5. Javanica Library

  6. Hystrix Dashboard Project

  7. Hystrix Dashboard Wiki

  8. Turbine

  9. Archaius

  10. JavaOne Presentation on Hystrix by Ben Christensen

Sponsors

  • comply advantage
  • ntt data
  • 3PillarGlobal
  • Betfair
  • Accenture
  • Siemens
  • Bosch
  • FlowTraders
  • MHP
  • Connatix
  • MetroSystems
  • BoatyardX
  • Colors in projects

VIDEO: ISSUE 97 LAUNCH EVENT