Impact Testing: Stop waiting for tests you do not need to run

Test Impact Analysis (TIA) is a modern way of speeding up the test automation phase of a build. It works by analyzing the minimum set of tests that need to be run after a change to production code.

Test Impact Analysis has received a lot of attention in the context of mono-repositories, where big teams put all of their application services and all common utilities in the same repository. Contributing to those repositories is very time-consuming, particularly because developers need to wait for the CI which runs all the tests, to validate if the contribution is not breaking anything else.

Not all tests have the same impact in terms of performance or scope. For this reason, they are commonly classified from fastest to slowest as unit tests, which are fast to execute because they do not rely on services (e.g database or API), integration tests, which validate the communication layer among services, and end-to-end tests, which validate an entire use case. A common fact that is widely accepted is:

“The more time it takes to execute a test, the more complex is to fix a bug”.

Running tests that take a long time to execute affects developers productivity. Therefore, it is recommended to write more unit tests, fewer integration tests, and a minimum set of end-to-end tests. The test pyramid is a graphical model that summarizes this.

One approach to enforce having this test piramid, where most of the tests are unit tests, is by only generating and comparing the tests coverage from unit tests.

Besides this, in order to promote agile code reviews, it is highly recommended to work with small pull requests. Otherwise, many – and probably off-topic – discussions appear, and thus, it will take forever to merge a pull request. After a pull request is created, several automatic checks are executed. The most common are: one of them is the CI check that runs all (or most of) the tests; another for code coverage; and another for SCA issues. Automatic checks help to protect the master branch and ensure that it is always ready to be released. However, does it make sense to run all the tests in all circumstances?

There are three main strategies to optimizes test automation, mainly extracted from this article of Martin Fowler:

Executing a Test Subset

Historically, teams would give up on making their tests infinitely fast, and use suites or tags to target a subset of tests. Teams might choose to have CI jobs that use a smaller suite per commit, and then also a nightly build job that runs all tests. Obviously, that delay’s bad news and defeats the aims of Continuous Integration.

Components to execute for a suite/tag/testing target

By Explicit Mappings

Google’s internal build system Blaze, has been copied into a few open source technologies over the years. Most notable are Buck from Facebook, Bazel from Google and Pants from Twitter, Foursquare and Square.

Blaze inside Google navigates a single directed graph across their entire monorepo. Blaze has a mechanism of direct association from test to production code. That mechanism is a fine-grained directory tree of production sources and associated test sources. It has explicit dependency declarations via BUILD files that were checked in too. For example:

Thus for a given directory/package/namespace, the developer could kick off a subset of the tests quite easily – but just the ones that are possible via directed graphs from the BUILD files. The time saver appears when the tool selects the automated subsetting of tests to run per commit based on this baked-in intelligence.

Dependency graph between 2 tests and the sources

By Test Impact Analysis

The key idea is that not all tests instantiate every production source file (or the classes declared from that source file). By using instrumentation while tests are running, this intelligence can be gained. Every time a class is loaded or used from a test, the mappings are updated. This is similar to the approach that code coverage tools follow to generate the reports. That intelligence ends up as a map of production sources and tests that would exercise them but begins as a map of which productions sources a test would exercise.

One test (from many) instantiates a subset of the production sources.

One prod source is exercised by a subset of the tests (whether unit, integration or end-2-end)

The mappings can only really be used for changes versus a reference point. This can be as simple as the work the developer would commit or has committed. It could also be a bunch of commits too. Say everything that was committed today (nightly build), since the last release or vs the master branch.

The simplistic approach of creating a list of what has changed is at production sources files level, but the ideal would infer what methods/functions have changed, and further subset to only tests that would exercise those. Usually, test impact reports are generated in every master build. During the build of pull requests, the tool infers, reading the previous master impact test report – the minimum set of affected tests to run that use code parts that are affected by our changes.

Usually, pull requests checks (e.g. Codecov) require generating code coverage reports from unit tests. However, there is no real benefit for filtering the subset of unit tests affected by our changes because they are the fastest to execute. The real impact testing tools rely on the time saved filtering integration and e2e tests. Consequently, pull requests checks and impact testing tools can coexist without conflicts.

Right now though, the only current implementation based on instrumentation for JVM projects is junit4git. This is an open-source library (created by a member of Engprod) that implements Junit4 and Junit5 extensions to generate reports on impact tests by running the junit tests using JVM instrumentation mechanisms. Impact tests are stored as git notes (without requiring a server to download/store them) when tests are run in master branch. Otherwise, it calculates the minimum subset of tests to run based in the last test impact report. The current tool limitation is that it relies only in changes in (Java, Scala, Kotlin.. . ) files, but still does not detect changes in other type of files (e.g. configuration files, resources) referenced from the tests.

Other ready to go technologies are from Microsoft (Visual Studio) and Redhat (smart-testing). Both works at the source-file level.