A New Approach to CI

September 5, 2023

bazel monorepo tech

In one of the groups I’m part of, someone recently asked “Does anyone have opinions on monorepos? It seems like they should be beneficial to CI in the same way that trunk-based development is, but I’ve never used one in anger.” This is a edited version of my response.

I’m a huge fan of co-locating code as a driver for making feedback loops tigher. Taking that co-location to its logical extreme, that naturally leads you towards a monorepo. Very few people go to the extreme, but it does suggest moving to fewer, larger repos.

There’s a whole bunch of issues that surround larger repos, but my view is that they can transform CI entirely. Why is that?

The main question we’re trying to answer with CI is “is this change safe to land into production?” If we believe the change is safe, we can push ahead. If we don’t, then we need to weigh our options, but typically we won’t push to production.

Let’s think about how traditional CI has attempts to answer the question of “is this safe to land into production?” by taking a “belt and braces” approach: we can’t prove what has been impacted by this change, so instead we’ll start by running a series of pipelines to try and make sure all the bases are covered.

The traditional CI pipeline looks something like:

Run formatters or linters that run almost instantly.
Run the small tests (or unit tests, if you prefer that terminology). This gives fast feedback.
Run the medium tests (or integration tests, if you prefer)
Fan out and run the large tests in buckets.
Fan back in, and build artifacts for deployment.
…
Profit!

Of course, there are almost as many ways of creating this as there are projects, but the general pattern is to front-load fast tests, and then fan out to run slower tests in parallel where possible.

What’s notable is that we’re generally running everything. That’s because we don’t really know what has been affected by a change, and for the sake of safety, we just run everything we can, hoping that it’ll catch any problems. For a small repo, this approach is probably fine, but as the repo grows, it lengthens the feedback loops, and the likelihood of something in an unrelated project to yours causing your builds to fail increases.

Now, I’ve seen plenty of teams attempting to write sophisticated tooling to use machine learning (or just good old fashioned statistics) to try and figure out which tests need to be run for which change. The results are never completely reliable, so there’s always the fallback of running everything.

The problem is that the repo is too large, and has become unwiedy to work on with the tools we’ve grown used to. You definitely need the right tooling to make a monorepo (or larger repo) work, and my tool of choice at the moment is Bazel.

That’s not because Bazel is an amazing tool (it has a wickedly steep learning curve, and it’s demand to completley enumerate inputs is deeply frustrating), but that it’s great at handling larger repos in the way that other tools just can’t, and of the new generation of build tools out there, it’s the one with momentum (meaning that you can find help in Stack Overflow)

One thing that Bazel allows is the ability to query the build graph. You can do some really nice things with this ability. For example, using a tool like Target Determinator, you can identify every single test that needs to be re-run, or library or binary that needs to be rebuilt for each change.

So your CI build stops being “run everything” and starts being “run just what needs to be run”, and that can save astonishing amounts of time, if you can determine what that is reliable and at speed. At a high level, your CI run becomes:

Use Target Determinator to identify the targets to rebuild and test.
Rebuild and test those targets.

One nice side-effect of this is that there’s no need to keep the entire tree green all the time. We all know that flaky tests sometimes creep in, or a test starts failing because some external system is down. Using target determination allows us to know that our change is fine, even if the rest of the repo is on fire.

Better yet, that irksome habit Bazel has of requiring you to list all your inputs has the handy side-effect of making remote builds far simpler, and not constrained to a single language (like distcc does). After all the build is just taking inputs, laying them out on disk, running a command, and collecting outputs. If those inputs are specified in enough detail, there’s no reason to be constrained to a single machine.

Being able to do distributed builds, either on locally managed infrastructure such as BuildBarn, or using a “build as a service” provider such as EngFlow or BuildBuddy, is another way of tightening feedback loops by scaling the build horizontally (though this relies on builds being broad, rather than a single, narrow critical path) You’ve got 300 tests to run? Just run them all at the same time. It takes the same amount of CPU, but the wall clock time drops dramatically.

Combine distributed caches, builds, and target determination, and your CI pipeline becomes a lot easier to manage. In many cases, it will look like the pipelines of the old days: just a straight list of steps that are carried out in sequence, without any fan-out or fan-in.

Of course, there are a host of problems that come with the approach of using larger repos, and the two I see people get most incensed by are:

Single version requirements of dependencies.
Having to fix other people’s builds when you break them

I’ve blogged a little about both of these here, and a little more about the cost savings that monorepos can represent here, but going into these issues in depth will need to wait for another day.

However, in short I strongly believe that monorepos are as beneficial to CI in the same way that trunk-based development is.

Sotto voce

I should really blog about the single version thing. It’s a pain, but largely because it surfaces incompatibilities and makes more visible the amount of work that needs to be done to make an update stick everywhere it should. I liken it to how Agile used to be compared to other methodologies (at least, how they were compared when I was at ThoughtWorks back before 2010).

All software development projects start in a relatively chaotic way, with uneven progress, and unforeseen hiccups. After a while, they settle down into their own rhythms, and become more predictable. The problem was that if a project reports progress every week or two, that initial chaos is far more visible than something that reports progress every month or even longer. It’s not that there’s a difference, it’s that the visibility is far higher (we shall set aside that I hope that most people here believe that the visibility is something that is ultimately useful and leads to better outcomes). I think the same applies for the single-version thing too: it surfaces incompatibilities so much sooner, and front-loads a pile of engineering effort that would otherwise have to be spent (with interest!) later on in the process, where change is harder.

There's No Such Thing as a Free Lunch Less recently