Wednesday, 22 November 2017

Organising a Monorepo

How should a monorepo be organised? It only takes a moment to come up with many competing models, but the main ones to consider are “by language”, “by project”, “by functional area”, and “nix style”. Of course, it’s entirely possible to blend these approaches together. As an example, my preference is “primarily language-based, secondarily by functional area”, but perhaps I should explain the options.

Language-based monorepos
These repos contain a top-level directory per language. For languages that are typically organised into parallel test and source trees (I’m looking at you, Java) there might be two top-level directories.

Within the language specific tree, code is structured in a way that is unsurprising to “native speakers” of that language. For Java, that means a package structure based on fully-qualified domain names. For many other languages, it makes sense to have a directory per project or library.

Third party dependencies can either be stored within the language-specific directories, or in a separate top-level directory, segmented in the same language specific way.

This approach works well when there aren’t too many languages in play. Organisation standards, such as those present in Google, may limit the number of languages. Once the number of languages becomes too many, it becomes hard to determine where to start looking for the code you may depend on.

Project-based monorepos
One drawback with a language-based monorepo is that it’s increasingly common to use more than one language per project. Rather than spreading code across multiple locations, it’s nice to co-locate everything needed for a particular project in the same directory, with common code being stored “elsewhere”. In this model, therefore, there are multiple top-level directories representing each project.

The advantage with this approach is that creating a sparse checkout is incredibly simple: just clone the top-level directory that contains the project, et voila! Job done! It also makes removing dead code simple --- just delete the project directory once it’s no longer needed, and everything is gone. This same advantage means that it’s easy to export a cell as an Open Source project.

The disadvantage with project-based monorepos is that the top level can quickly become bloated as more and more projects are added. Worse, there's the question of what to do when projects are mostly retired, or have been refactored to mostly slivers of their former glory.

Functional area-based monorepos
A key advantage of monorepos is “discoverability”. It’s possible to organise a monorepo to enhance this, by grouping code into functional areas. For example, there might be a directory for “crypto” related code, another for “testing”, another for “networking” and so on. Now, when someone is looking for something they just need to consider the role it fulfills, and look at the tree to identify the target to depend on.

One way to make this approach fail miserably is to make extensive use of code names. “Loki” may seem like a cool project name (it’s not), but I’ll be damned if I can tell what it actually does without asking someone. Being developers, we need snazzy code names at all times, and by all means organise teams around those, but the output of those projects should be named in as a vanilla a way as possible: the output of “loki” may be a “man in the middle ssl proxy”, so stick that in “networking/ssl/proxy”. Your source tree should be painted beige --- the least exciting colour in the world.

Another problem with the functional area-based monorepos is that considerable thought has to be put into their initial structure. Moving code around is possible (and possible atomically), but as the repo grows larger the structure tends to ossify, and considerable social pressure needs to be overcome to make those changes.

Nix-style monorepos
Nix is damn cool, and offers many capabilities that are desirable for a monorepo being run in a low-discipline (or high-individuality) engineering environment, incapable of managing to keep to only using (close to a) single version of each dependency. Specifically, a nix-based monorepo actively supports multiple versions of dependencies, with projects depending on specific versions, and making this clear in their build files.

This differs from a regular monorepo with a few alternate versions of dependencies that are particularly taxing to get onto a single version (*cough* ICU *cough*) because multiple versions of things are actively encouraged, and dependencies need to be more actively managed.

There are serious maintainability concerns when using the nix-style monorepo, especially for components that need to be shared between multiple projects. Clean up of unused cells, mechanisms for migrating projects as dependencies update, and stable and fast constraint solving all need to be in place. Without those, a nix-style monorepo will rapidly become an ungovernable mess.

The maintainability issue is enough to make this a particularly poor choice. Consider this the “anti-pattern” of monorepo organisation.

Blended monorepos
It’s unlikely that any monorepo would be purely organised along a single one of these lines; a hybrid approach is typically simpler to work with. These “blended monorepos” attempt to address the weaknesses of each approach with the strengths of another.

As an example, project-based monorepos rapidly have a cluttered top-level directory. However, by splitting by functional area, or language and then functional area, the top-level becomes less cluttered and simultaneously easier to navigate.

For projects or dependencies that are primarily in one language, but with support libraries for other languages, take a case-by-case approach. For something like MySQL, it may make sense to just shovel everything into “c/database/mysql”, since the java library (for example) isn’t particularly large. For other tools, it may make more sense to separate the trees and stitch everything together using the build tool.

Third party dependencies
There is an interesting discussion to be had about where and how to store third party code. Do you take binary dependencies, or pull in the source? Do you store the third party code in a separate third party directory, or alongside first party code? Do you store the dependencies in your repository at all, or push them to something like a Maven artifact repository.

The temptation when checking in the source is that it becomes very easy to accidentally start maintaining a fork of whichever dependency it is. After all, you find a bug, and it’s sooo easy to fix it in place and then forget (or not be allowed) to upstream the fix. The advantage of checking in the source is that you can build from source, allowing you to optimise it as along with the rest of the build. Depending on your build tool, it may be possible to only rely on those parts of the library that are actually necessary for your project.

Checking in the binary artifacts has the disadvantage that source control tools are seldom optimised for storing binaries, so any changes will cause the overall size of the repository to grow (though not a snapshot at a single point in time) The advantage is that build times can be significantly shorter (as all that needs to be done is link the dependency in)

Binary dependencies pulled from third parties can be significantly easier to update. Tools such as maven, nuget, and cocoapods can describe a graph of dependencies, and these graphs can be reified by committing them to your monorepo (giving you stable, repeatable historical builds) or left where they lie and pulled in at build time. As one of the reviewers of this post pointed out, this requires the community the binaries are being pulled from to be well managed: releases must not be overwritten (which can be verified by simple hash checks), and snapshots should be avoided.

Putting labels on these, there are in-tree dependencies and externally managed dependencies, and both come in source and binary flavours.

Thanks

My thanks to Nathan Fisher, Josh Graham, Will Robertson, and Chris Stevenson for their feedback while writing this post. Some of the side conversations are worth a post all of their own!

Monday, 20 November 2017

Tooling for Monorepos

One argument against monorepos is that you need special tooling to make them work. This argument commonly gets presented in a variety of ways, but the most frequent boil down to:


  1. Code size: a single repo would be too big for our source control system!
  2. Requirement for specialised tooling: we're happy with what we have!
  3. Reduces the ability of teams to move fast and independently
  4. Politics and fiefdoms


Let’s take each of these in turn.


Code size
Most teams these days are using some form of DVCS, with git being the most popular. Git was designed for use with the Linux kernel, so initially scaled nicely for that use-case, but started to get painful after that. That means that we start with some pretty generous limits: a fresh clone of linux repo at depth 1 takes just shy of 1GB of code spread between in over 60K files (here’s how they make it work!). Even without modifying stock git, Facebook was able to get their git repo up to 54GB (admittedly, with only 8GB of code). MS have scaled Git to the entire Windows codebase: that’s 300GB spread between 3.5M files and hundreds of branches. Their git extensions are now coming to GitHub and non-Windows platforms.


Which is good news! Your source control system of choice can cope with the amount of code a monorepo contains. Hurrah!


But how long does that take to check out? I’ll be honest, checking out a repo that’s 1GB large can take a while. If that is, you check out the whole 1GB. Git, Mercurial, Perforce, and Subversion support “sparse” working copies, where you only clone those directories you need. The sparse checkout declarations can either be declared in files stored in source control, or they can computed. They likely follow cell boundaries within the monorepo. It should be clear that in the ideal case, the end result is a working copy exactly the same size as a hand-crafted repository containing just what’s needed, and nothing more. As a developer moves from project to project, or area to area, they can expand or contract their current clone to exactly match their needs.


So your checkouts don’t necessarily get larger. They may even get smaller.


But, what if you do have everything checked out? Your source control tool needs to know which files have changed. As the size of the repository grows, the slower these operations become, impacting developer performance. Except both Git and Mercurial have support for filesystem watching daemons (notably “watchman”) These allow file checking operations to scale linearly with the number of files changed, rather than with the number of files in the repository (I’d hope that even those using a “normal” large checkout would consider using this)


So everything is fine with the raw tooling. But what about your IDE?


I mean, yeah, if you’ve checked out the entire source tree, surely your IDE will grind to a halt? First of all, don’t do that --- use a sparse clone --- but if you insist on doing it, update your tooling. Facebook spent a chunk of resources to help make IntelliJ more efficient when dealing with large projects, and upstreamed those changes to Jetbrains, who accepted the patches. It was possible to pull in the source code for every Facebook Android app at the same time in IntelliJ. You may have a lot of code, but it’s unlikely to be that much. Other editors can also happily work with large source trees.


So, code size isn’t the problem you might imagine it is.


Requirement for specialised tooling


Quite often when people talk about monorepos, they also talk about the exotic tooling they use, from custom build systems, tricked-out source control servers, and custom CI infrastructure. Perhaps a giant company has the time and resources to build that, but you’re too busy doing your own work.


Except a monorepo doesn’t require you to do any of those things. Want to use a recursive build tool you’re already familiar with? Go ahead. Paul Hammant has done some interesting work demonstrating how it’s possible to use maven (and, by extension, gradle and make) in a monorepo.


Switching to a build tool such as buck or bazel does make using a monorepo simpler, because these tools provide mechanisms to query the build graph, and can be simply configured to mark various parts of the tree as being visible or not to particular rules, but using one of these isn’t required. One nice thing? You don’t need to write buck or bazel yourself --- they’re both already out there and available for you to use.


Similarly, if you’re comfy with jenkins or travis, continue using them. Admittedly, you’ll need to configure the CI builds to watch not just a repo, but a subdirectory within a repo, but that’s not too hard to do. If you’re using a graph-based build tool, then you can even use jenkins or buildbot to identify the minimal set of items to rebuild and test, but, again, there’s no need to do that. Just keep on trucking the way you do now.


Reduces the ability of teams to move fast and independently


Having a repository per-project or per-team allows them to operate entirely independently of one another. Except that’s not true unless you’re writing every single line of code yourself. It’s likely you have at least a few first and third party dependencies. At some point, those dependencies really should be updated. Having your own repo means that you can pick the timing, but it also means you have to do the work.


Monorepos naturally lead people to minimising the number of versions of third party dependencies towards one, if only to avoid nasty diamond dependency issues, but there’s no technical reason why there can’t be more than one version of a library in the tree. Of course, only a narcissist would check in a library without making an effort to remove the old versions. There are a pile of ways to do this, but my preferred way is to say that the person wanting the update manages the update, and asks for help from teams that are impacted by the change. I’ll cover the process in a later post. No matter how it’s done, the effect of having a single atomic change amortises the cost of the change over all the repos, reducing the cost of software development across the entire organisation by front loading the cost of making the change.


But perhaps it’s not the dependencies you enjoy freedom on. Perhaps it’s the choice of language and tooling? There’s no reason a properly organised monorepo can’t support multiple languages (pioneers such as Google and Facebook have mixed language repos) Reducing the number of choices may be an organisation-level goal, in order to allow individuals to cycle quickly and easily between teams (which is why we have code style guidelines, right?), but there’s nothing about using a monorepo that prevents you from using many different tool chains.


As a concrete example of this, consider Mozilla. They’re a remote-first, distributed team of iconoclasts and lovely folks (the two aren’t mutually exclusive :) ) Mozilla-central houses a huge amount of code, from the browser, through extensions, to testing tools, and a subset of the web-platform-tests. A host of different languages are used within that tree, including Python, C/C++, Rust, Javascript, Java, and Go, and I’m sure there are others too. Each team has picked what’s most appropriate and run with those.


Politics and fiefdoms


There’s no getting away from politics and fiefdoms. Sorry folks. Uber have stated that one of the reasons they prefer many separate repositories is to help reduce the amount of politics. However, hiding from things is seldom the best way to deal with them, and the technical benefits of using a monorepo can be compelling, as Uber have found.


If an organisation enthusiastically embraces the concept of collective code ownership, it’s possible to avoid anything other than purely social constructs to prevent ego being bruised and fiefdoms being encroached on. The only gateways to contribution become those technical gateways placed to ensure code quality, such as code review.


Sadly, not many companies embrace collective code ownership to that extent. The next logical step is apply something like GitHub’s “code owners”, where owners are notified of changes before they are committed (ideally. Using post-commit hooks for after the fact notification isn’t as efficient) A step further along, and OWNERS files (as seen in Chromium’s source tree) list individuals and team aliases that are required to give permission to land code.


If there is really strong ownership of code, then your source control system may be able to help. For example, perforce allows protection levels to be set for individual directories within a tree, and pre-commit hooks can be used for a similar purpose with other source control systems.

Getting the most of a monorepo


Having said that you don't need to change much to start using a monorepo, there are patterns that allow one to be used efficiently. These suggestions can also be applied to any large code repositories: after all, as Chris Stevenson said “any sufficiently complicated developer workspace contains an ad-hoc, informally specified, bug-ridden implementation of half a monorepo”


Although it’s entirely possible to use recursive build tools with a monorepo (early versions of Google’s still used make), moving to a graph-based build tool is one of the best ways to take advantage of a monorepo.


The first reason is simply logistical. The two major graph-based build tools (Buck and Bazel) both support the concept of “visibility”. This makes it possible to segment the tree, marking public-facing APIs as such, whilst allowing teams to limit who can see the implementations. Who can depend on a particular target is defined by the target itself, not by its consumers, preventing uncontrolled growth in access to internal details. An OOP developer is already familiar with the concept of visibility, and the same ideas apply, scaled out to the entire tree of code.


The second reason is practical. The graph-based build tools frequently have a query language that can be used to quickly identify targets given certain criteria. One of those criteria might be “given this file has changed, identify the targets that need to be rebuilt”. This simplifies the process of building a sensible, scalable CI system from building blocks such as buildbot or GoCD.


Another pattern that’s important for any repository that has many developers hacking on it simultaneously is having a mechanism to serialise commits to the tree. Facebook have spoken about this publicly, and do so with their own tooling, but something like gerrit, or even a continuous build could handle this. Within a monorepo, this tooling doesn’t need to be in place from the very beginning, and may never be needed, but be aware that it eases the problem of commits not being able to land in areas of high churn.


A final piece in the tooling puzzle is to have a continuous build tool that’s capable of watching individual directories rather than the entire repository. Alternatively, using a graph-based build tool allows a continuous build that watches the entire repository to at least target the minimal set of targets that need rebuilding. Of course, it’s entirely possible to place the continuous build before the tooling that serialises the commits, so you always have a green HEAD of master….


Thanks

My thanks to Nathan Fisher, Josh Graham, Paul Hammant, Will Robertson, and Chris Stevenson for their feedback and comments while writing and editing this post. Without their help, this would have rambled across many thousands of words.

Sunday, 19 November 2017

Some Useful Monorepo Definitions

The concept of a monorepo seems so self-evident that there is little need to define it. Just co-locate all your code in one place, and you’re done, right?

The problem is that this doesn’t capture lots of the nuance of the term. After all, if all you have is a single project, then, by this definition, you have a monorepo. While technically correct (the best kind of correct!) this doesn’t feel right. There has to be more to it than that.

Monorepo
Summary: A monorepo represents the body of code and supporting digital assets owned by an organisation. Within that body of code, it’s possible to draw logical boundaries around certain areas, either shared libraries, individual projects, or other groupings.

Discussion:
Previously, I’ve written that a monorepo is “a unified source code repository used by an organisation to host as much of its code as possible.” That does the job, but I think it falls short of succinctly describing the goals of a monorepo in favour of an implementation of the pattern. Oh well, exploration of an idea is an iterative process, with each iteration being able to use the insights from previous iterations. Let’s iterate again!

Cell
Summary: A cell is an atomic unit representing a single logical piece within the monorepo.

Discussion:
When we were working on Buck, we struggled for a long time to come up with the best name for the logical areas with the monorepo. Initially, they were formed from the individual repositories we were coalescing into the monorepo. However, “repository” was an overloaded term, and so one we wanted to avoid. Similarly, “module” already has established meaning in some of languages we wanted to support.

In the end, we settled on using a biological metaphor. Because a monorepo represents a body of code, and these logical groupings represent the atomic units that the monorepo is constructed from, we called them cells. In many organisations, pre-monorepo, a cell represents a single repository.

Because of this mapping to a conceptual repository, a cell is a great candidate for Open Sourcing. Should this happen, it’s entirely possible that there needs to be some tooling to map file structure from the shape used within the monorepo to the shape expected by the OSS library. Ideally, that tooling would allow code to be both imported and exported to and from the monorepo, rather than only allowing a push in a single direction.

Projected Monorepo
Summary: A set of repositories presented as if they were a monorepo, typically via additional tooling.

Discussion:
Monorepos may be classified by the way that the code within is organised, but there is another approach: the projected monorepo. This isn’t a monorepo in the (umm…) traditional sense, where all the code is in the same code repository, but something that acts as if it were a monorepo through external tooling. An example would be the Android Open Source project, which uses “repo” to stitch together multiple separate repositories into something that acts as a single cohesive whole. To a lesser extent, things like git submodules also fulfill the same role of creating projected monorepos.

In a projected monorepo, it is clear where the cells lie --- they’re the individual repositories that are being stitched together to form the new whole.

Target
Summary: The individual units addressable by the build tool, which are used to declare dependencies.

Discussion:
Within a monorepo there are targets. These are units that are addressable by the build tool, and are also typically used to declare dependencies. They typically have concrete outputs, such as libraries or binaries. Targets are human-readable, and are most commonly given as a path within the repository.

A cell is typically composed of many targets. As an example, perhaps a cell consists of a single library. There might be targets within that cell would allow the library to be built, the tests for that library to built, and (perhaps) another to allow those tests to be run.

Graph-based build tool
Summary: A build tool designed for use within a monorepo where build files are located throughout the source tree and used in a non-recursive manner.

Discussion:
It's common to use a graph-based build tool with monorepos. These are tools that are natively designed for a monorepo, and operate on the directed acyclic graph of dependencies between targets. They typically provide the ability to build polyglot projects, and the ability to query the build graph. The two major examples are Google’s bazel and Facebook’s buck. Both of these tools can trace their user-facing design to Google’s “Blaze” build tool.

Admittedly, behind the scenes almost every build tool makes use of basic graph theory in order to work: after all, most tools to a topological sort of targets in order to work their magic, and they frequently have commands that allow that graph to be queried. The major difference between these other tools and what I’m terming a “graph-based build tool” is the use of build files throughout the tree that are used in a non-recursive way. This encourages the creation of relatively small compilation units.

Hopefully these terms, and the various ways of organising a monorepo, give us a common language to discuss monorepos in a meaningful way.

Thanks
My thanks to Kent Beck, Nathan Fisher, Josh Graham, Paul Hammant, Will Robertson, and Chris Stevenson for their comments and feedback while writing this post. The conversations have definitely helped clarify and improve this post.

A Month in Selenium - October

Another month, another update, you lucky people.  The highlights:

W3C TPAC.
I attended the W3C's TPAC meeting in California. This is the main get-together for many of the "working groups" that are working on standards as part of the W3C. It's also where the Browser Tools and Testing Working Group met to discuss progress on the WebDriver spec.

Good news! Once we clean up the implementation report, we're ready to move to "Proposed Recommendation", which is the last step before becoming a standard (or "Recommendation" in W3C parlance).

More good news! The "Level 2" version of WebDriver will have a new logging infrastructure added. This will make it easier for you (yes, you!) to figure out where failures have occurred. Better insight should lead to more stable tests.

Even more good news! Some of the folks from Sauce Labs attended the face-to-face meeting. They help bring an additional perspective to the design and use cases of the protocol. Until now, the group has been mostly composed of browser implementors and people from the Selenium project. The more people involved with the spec, the better it's going to be.

The minutes for this face-to-face session are available, as are the minutes for the other face-to-face sessions.

Hacking on Selenium
Last month, we were closing in on the Selenium 3.7 release. This month we shipped 3.7.0 and then, because of a small oversight where we missed a jar file in the downloadable artefacts, 3.7.1. There are some nice changes in there. As mentioned last month, one of the areas of focus has been improving how we handle the New Session command when dealing with a local end that might speak both the W3C and JSON Wire Protocol dialects of the webdriver protocol. One of the things that the spec says we're meant to do is pass though additional top-level fields in the new session payload. 3.7.1 now does this (hurrah!)

One of the nice things from the work in 3.7 is that we've laid the groundwork for a clean up of the Selenium Grid code. As part of that, we restored a behaviour where a Grid Node, configured with a path for the Firefox or Chrome binary would have this path injected into any capabilities when starting a session. Making the nodes even more configurable is something that's on the road map for a later release.

More next month!

Wednesday, 18 October 2017

A Month in Selenium - September

I realise that this blog has been pretty quiet. Part of the reason for that is that I'm terrible at sitting down and just writing. What I really need is an incentive. That incentive arrived this month in the form of the Selenium Fellowship, which takes the form of a stipend to fund work hacking on Selenium. Part of the agreement is a monthly blog post. So, you all have the Software Freedom Conservancy to thank :)

So, what contributions have I been making to the Selenium project this month?

There are two major highlights. The first of these is Selenium Conf, which was in Berlin. I gave the State of the Union keynote (so called because the first one was an update of how the merger of the Selenium and WebDriver projects was going) Over the past few Selenium Conferences, the theme has slowly been building that Open Source Software depends on people to move it forward. This time, the message was far starker, as I counted the number of people who contribute to key parts of the project --- for some pieces, we depend on one person alone. I also covered the various moving pieces in the project, using Kent Beck's "3X" model as a framework to hold the talk together.

As well as being part of the show at SeConf, I also had the pleasure of helping out Jim Evans in the "Fix a Bug, Become a Committer" workshop. He did a great job explaining how the pieces fit together, and by the end of the workshop, we had everyone building Selenium and running tests in their IDEs of choice (provided that choice wasn't "Eclipse"), which is a testament to the hard work he'd put into preparing the session. It did highlight that the "getting started" docs probably need a bit of a polish to become usable. I was also invited to do a Q&A with the folks in the "Selenium Grid" workshop, where I broke from theme to talk about the role of QA in a team. Thanks for being patient, everyone!

In terms of code, as I write this, I've landed 57 commits since September 17th. Part of this was to help shape the 3.6 release. For Java, the theme of this release was the slow deprecation of the amorphous blob of data that is "DesiredCapabilities" to the more strongly-typed "*Options" classes (eg. FirefoxOptions, ChromeOptions, etc). The idea behind the original WebDriver APIs was to lead people in the right direction: if they could hit the "autocomplete" keyboard combination in their IDE of choice, then they'd be able to figure out what to do next. The strong typing is a continuation of this concept, and is something that all the main contributors are fans of.

One implementation detail we made in the Java tree is that each of the Options classes are also Capabilities. I made this choice for two reasons. The first is philosophical. We don't know ahead of time what new features will land in browsers (headless running for Chrome and Firefox are examples), so we'll always need an "escape hatch", to allow people to set additional settings and capabilities we're not aware of. The second is pragmatic. The internals of Selenium's java code is set up to deal with Capabilities, and people extending the framework have been dealing with them as an implicit contract of the code.

In the wild, there are two major, and one very minor, "dialects" of the JSON-based protocol spoken by the various implementations. The first is the original "JSON Wire Protocol", and the second is the version of that protocol that has been standardised as part of the W3C "WebDriver" specification. We took pains when standardising to make sure that a JSON Wire Protocol response is almost always a valid W3C response (technical note: because all values are returned as a JSON Object with a "value" entry, which contains the return value), but there are two areas where the dialects diverge wildly.

One area is around the "Advanced User Interactions" APIs. The end point offered by the W3C spec is significantly more flexible and nifty than the original version in the Selenium project, but it is also a lot more complex to implement.

The other area is around "New Session", which is command used to create a new Selenium session. The JSON Wire Protocol demands that the user place the set of features that they're interested in using into a "desiredCapabilities" JSON blob. This was originally designed as part of a "resource acquisition is initialisation" pattern --- you'd load up the blob with everything you might want (a chrome profile, an equivalent firefox profile, the proxy you'd like to use) mashing together items that theoretically only belonged to one browser into a single unit. The remote end was then to do a "best effort" attempt to meet those requirements, and then report back what it had provided. The local end (the driver code) was then to test whether or not the returned driver was suitable for whatever it was that users wanted to do. Which is why they were called desired capabilities --- you made a wish, and then could look to see if it came true. If nothing matched, it was legit for a selenium implementation to just start up any driver and give you that.

The W3C protocol is a lot more structured. It provides for an ordered series of matches that can be made, with capabilities that must be present in all cases. For our example above, the proxy would be used for any driver, and then there'd be an ordered set of possible matches for chrome and then firefox (or vice versa). Each driver provider gets a chance to fulfill that request, and if it can, then we use that driver. If nothing matches, then we fail to initialise the session and return an exception to the users.

The more structured data used by the W3C New Session command is sent in a different key in the JSON blob, and this is by design. In theory, it's possible to map a JSON Wire Protocol "New Session" payload to the W3C one, and to map the W3C structure to something close to the JSON Wire Protocol payload. Sadly, this process is complex and error prone, and there are language bindings that have been released that get this wrong to one degree or another (and, indeed, some that don't even make the effort) All this means that the Selenium Server has to try and discern the user's intent from the blob of data sent across the wire. Getting this right, and flexible, has been the focus of the forthcoming 3.7 release.  It's fiddly work, but it'll be worth it in the end.

Another common problem we see is that some servers out there speak the W3C protocol natively (eg. IEDriverServer, geckodriver, the Selenium Server) and others don't yet (eg. safaridriver, chromedriver, and services such as Sauce Labs). A big part of the 3.5 release was the "pass through" mode, which means that if the Selenium Server detects that both ends speak the same "dialect" of the wire protocol, it'll just shuttle data backwards and forwards. However, if it detects that the two ends don't speak the same protocol, it'll do "protocol conversion", mapping JSON Wire Protocol calls to and from W3C ones. This has been made easier by the fact that the W3C spec is congruent with the JSON Wire Protocol -- the two have identical end points for many commands.

But not all commands. The main ones that have been causing grief have been the advanced user interaction commands, particularly when a local end speaks the JSON Wire Protocol, and the remote end speaks the W3C one. Just such this situation arises for users of some cloud-based Selenium servers, and its been a constant source of questions from users. To help address this, I've landed some code that does emulation of the common JSON Wire Protocol advanced user interaction commands (things like "moveTo"). Hopefully this will address the majority of headaches that people are experiencing using this new functionality.

Let's see what the next month brings. Hopefully, we'll ship 3.7 :)