As part of a new internal web project, one of the QA team's goals was to design and run a reliable and fast regression suite as part of the CD pipeline. This was meant to raise confidence levels for each build by running a full set of tests instead of a select set of sanity tests, however, for this to be a viable option, we needed the tests to be executed relatively fast (a maximum of ten minutes/run). But, when you have a large number of tests, reducing the runtimes is easier said than done. Right from the start we had three major challenges that needed to be overcome.
A typical deployment consists in running tests on at least 2 nodes and a VIP. Each of these runs entails a total of 450 front end tests and another 70 backend tests. This means that at least 3 regression runs are needed and, to keep our deployment times down to under 30 minutes, a test run time should be kept under 10 minutes.
For regression runs to be a viable solution, we needed to make sure that the tests are reliable and that random false positives like dependency failures, caused by environment hiccups, do not cause the suite to fail, thus halting the entire deployment process.
We also needed to consider scalability, so if additional tests are added we could still keep runtimes down.
Using the following guideline, we successfully built a test suite that was not only fast, but also reliable enough for our Continuous Delivery goals. We managed to constantly run our regression suite in approximately 7 minutes with no build failures.
The tests are based on Selenium's RemoteWebDriver (grid) and the tests are using a remote selenium grid environment with multiple nodes for a better scalability and a local hub with one node for local tests runs.
Running hundreds of tests fast is not possible without using some kind of paralleling method. Simply put, the more parallel friendly your tests are, the faster a suite can be run. However, building a parallel friendly test project is not that simple, each test method needs to be designed with multi-threading in mind. This means that each test method needs to be created as an individual module independent from other tests.
To be able to adhere to the earlier mentioned strategy, we used the following method to create RemoteWebDriver instances:
MyDriver, in our case, is a custom class that extends the RemoteWebDriver class, we use it to override some default selenium functions i.e. the getScreenshotAs method.
Modular tests allow you to scale your tests as you wish, testNG parallel capabilities combined with a large enough selenium grid allows a great number of methods to run in the same time, thus drastically reducing suite run times.
Here is an example of a test class with a couple of tests using the above method:
Note:
As you probably noticed I'm not using the @BeforeMethod to setup the test method or @AfterMethod to tear down the driver instances. This is due to some thread safety issues when running a large number of tests in parallel and especially when using data providers with all instances running in parallel. Instead, we chose to make use of custom listeners and to override the onTestFailure and onTestSuccess methods to clean up after each test, but I will get into this in more detail a bit later.
Within your web app's UI there are areas that your tests interact with. A Page Object simply models these as objects within the test code. This reduces the amount of duplicated code and means that if the UI changes, the fix need only be applied in one place.
Custom listeners extend testNG's TestListenerAdapter to make use of the ITestResult object. These listeners offer a very flexible management of posttest actions, which is especially useful when WebDriver instances need to be cleaned after each test, so that the grid is not saturated with redundant sessions and has resources available as soon as possible.
The following code snippets show how we handle ramp down on test failure and test success.
Custom onTestFailure
Custom onTestSuccess
Note: AbstractTest in the above examples is a test setup class that is used by all our tests and it handles the spring context loading and the listener initializations.
One of the most important requirements for the regression step in the Continuous Delivery process is for the tests to be stable and to be able to be run consistently without false positives and with 100 % pass rate. However, this is actually very hard to implement, especially for front end tests. There are a lot of things that can go wrong like dependency issues, slow page loading, actual bugs etc. What we have done on our test project to increase the regression stability was to use dynamic waits and a conditional retry mechanism.
Selenium has a very useful wait mechanism, the implicit wait is to tell WebDriver to poll the DOM for a certain amount of time when trying to find an element or elements if they are not immediately available. The default setting is 0. Once set, the implicit wait is set for the life of the WebDriver object instance. But unfortunately this is not always a bulletproof solution especially in the case of JS heavy pages. To cover these shortcomings conditional wait are the answer.
Examples:
This method will wait for Jquery to be active and for the document.readyState
to be complete:
The second example will wait until the target element is clickable
Note: Both examples use the wait.until(
As mentioned earlier another key instrument to increase the stability of your test is the conditional retry mechanism. Simply retrying failed tests a maximum number of times will make your test pass rate rise. Still, this has some major flaws, like, for instance, the possibility of actual defects left unnoticed because it only happens once in a while.
Our solution for this problem was to retry only when a certain condition is met. This is done by checking the exception of the failing test. So when, for example, the tests fail because the setup of test data fails, the test will be rerun but if the tests fails for any other reason it will be marked as failed.
One of the more annoying issues with selenium and chrome is the "Element is not clickable at point" bug. Selenium's click method tries to click the target element on the center point, but sometimes even though the page and its elements are loaded an additional rendering is done to reposition the element this causes the earlier mentioned issue. A workaround for this is to use javascript to click the element or the Actions object.
jsClickElement
Actions
Sometimes, you will need to run your tests on IE, and even though newer versions of the browser are comparable with the more mainstream browsers there are some issues left. One of these is the problem with SSL certificates, the ACCEPT_SSL_CERTS capability doesn't work in later IE versions. The only workaround that I found so far was to use this method to accept the certificate for the target page.
By working on this project we have learned a lot about the capabilities of Selenium and that it can be customized and used to fit your needs. For us this has meant being able to integrate our tests into our Continuous Delivery solution, offering a fast and reliable way of testing our product.