Sunday, 19 November 2017

Some Useful Monorepo Definitions

The concept of a monorepo seems so self-evident that there is little need to define it. Just co-locate all your code in one place, and you’re done, right?

The problem is that this doesn’t capture lots of the nuance of the term. After all, if all you have is a single project, then, by this definition, you have a monorepo. While technically correct (the best kind of correct!) this doesn’t feel right. There has to be more to it than that.

Summary: A monorepo represents the body of code and supporting digital assets owned by an organisation. Within that body of code, it’s possible to draw logical boundaries around certain areas, either shared libraries, individual projects, or other groupings.

Previously, I’ve written that a monorepo is “a unified source code repository used by an organisation to host as much of its code as possible.” That does the job, but I think it falls short of succinctly describing the goals of a monorepo in favour of an implementation of the pattern. Oh well, exploration of an idea is an iterative process, with each iteration being able to use the insights from previous iterations. Let’s iterate again!

Summary: A cell is an atomic unit representing a single logical piece within the monorepo.

When we were working on Buck, we struggled for a long time to come up with the best name for the logical areas with the monorepo. Initially, they were formed from the individual repositories we were coalescing into the monorepo. However, “repository” was an overloaded term, and so one we wanted to avoid. Similarly, “module” already has established meaning in some of languages we wanted to support.

In the end, we settled on using a biological metaphor. Because a monorepo represents a body of code, and these logical groupings represent the atomic units that the monorepo is constructed from, we called them cells. In many organisations, pre-monorepo, a cell represents a single repository.

Because of this mapping to a conceptual repository, a cell is a great candidate for Open Sourcing. Should this happen, it’s entirely possible that there needs to be some tooling to map file structure from the shape used within the monorepo to the shape expected by the OSS library. Ideally, that tooling would allow code to be both imported and exported to and from the monorepo, rather than only allowing a push in a single direction.

Projected Monorepo
Summary: A set of repositories presented as if they were a monorepo, typically via additional tooling.

Monorepos may be classified by the way that the code within is organised, but there is another approach: the projected monorepo. This isn’t a monorepo in the (umm…) traditional sense, where all the code is in the same code repository, but something that acts as if it were a monorepo through external tooling. An example would be the Android Open Source project, which uses “repo” to stitch together multiple separate repositories into something that acts as a single cohesive whole. To a lesser extent, things like git submodules also fulfill the same role of creating projected monorepos.

In a projected monorepo, it is clear where the cells lie --- they’re the individual repositories that are being stitched together to form the new whole.

Summary: The individual units addressable by the build tool, which are used to declare dependencies.

Within a monorepo there are targets. These are units that are addressable by the build tool, and are also typically used to declare dependencies. They typically have concrete outputs, such as libraries or binaries. Targets are human-readable, and are most commonly given as a path within the repository.

A cell is typically composed of many targets. As an example, perhaps a cell consists of a single library. There might be targets within that cell would allow the library to be built, the tests for that library to built, and (perhaps) another to allow those tests to be run.

Graph-based build tool
Summary: A build tool designed for use within a monorepo where build files are located throughout the source tree and used in a non-recursive manner.

It's common to use a graph-based build tool with monorepos. These are tools that are natively designed for a monorepo, and operate on the directed acyclic graph of dependencies between targets. They typically provide the ability to build polyglot projects, and the ability to query the build graph. The two major examples are Google’s bazel and Facebook’s buck. Both of these tools can trace their user-facing design to Google’s “Blaze” build tool.

Admittedly, behind the scenes almost every build tool makes use of basic graph theory in order to work: after all, most tools to a topological sort of targets in order to work their magic, and they frequently have commands that allow that graph to be queried. The major difference between these other tools and what I’m terming a “graph-based build tool” is the use of build files throughout the tree that are used in a non-recursive way. This encourages the creation of relatively small compilation units.

Hopefully these terms, and the various ways of organising a monorepo, give us a common language to discuss monorepos in a meaningful way.

My thanks to Kent Beck, Nathan Fisher, Josh Graham, Paul Hammant, Will Robertson, and Chris Stevenson for their comments and feedback while writing this post. The conversations have definitely helped clarify and improve this post.