Channable

Tech

Nix is the ultimate DevOps toolkit

April 9, 2021

At Channable we use Nix to build and deploy our services and to manage our development environments. This was not always the case: in the past we used a combination of ecosystem-specific tools and custom scripts to glue them together. Consolidating everything with Nix has helped us standardize development and deployment workflows, eliminate “works on my machine”-problems, and avoid unnecessary rebuilds. In this post we want to share what problems we encountered before adopting Nix, how Nix solves those, and how we gradually introduced Nix into our workflows.

Back story

About 1.5 years ago we decided to gradually adopt Nix. First only for development environments and CI, later also for deploys and production use. Up until then we had used a variety of tools to build, package, and deploy our software:

  • pip and virtualenv for Python (and later also pipenv)
  • yarn for JavaScript
  • stack for Haskell
  • apt for packaging and deploying our software

For building APT packages we used a variety of make, docker, and a home-grown build tool called channabuild. This worked well enough for a while, but our list of build requirements kept growing and we quickly discovered that we did not have the time to build and maintain a high-quality build system.1 This is why we decided to give Nix a try.

We started by dipping one tiny toe in the water. One day our CI build broke because of a backwards incompatible change in a new release of Haskell’s build tool Stack, and CI always installed the latest version. Nix enables pinning all programs, including Stack, so this was a nice opportunity for a couple of Nix advocates in the company to try it out in a single repository.

In the next step, Nix was used to build a Python environment with the right libraries for some benchmark scripts. These small steps turned out to be very successful, for the month after that, plans were being made to migrate our biggest Haskell repository to being built using Nix. Over time, more and more projects were migrated to Nix. First by moving over the local development setup, then the continuous integration setup, and lastly the software packages that get deployed to production.

Having a couple of Nix advocates was crucial to kickstart the enthusiasm for Nix in the rest of the company. Their approach to start small and migrate piece by piece turned out to be successful. We soon realized that Nix solves many of the problems that we had, and saves us quite a bit of problem solving time in the long run.

Problems with our previous build and deploy pipeline

Our main motivation to switch to Nix was its promise to solve a number of thorny issues with our builds, and with our deploy process. It is worth looking at these problems in detail, since many other organizations writing software face similar issues, and we hope that this experience report will help you make a more informed decision if you are considering adopting Nix in your organization.

Let's see how we used Nix to solve three concrete problems in our builds.

Problems with our Python and APT setup

For our largest Python projects, we used a combination of Pip, Virtualenv, and APT. We would install our dependencies in a virtualenv with Pip, and package that together with our code as an APT package to be installed in production. While it served us well for many years, it had its pain points:

  • It was difficult to update a single dependency while keeping the others pinned (including transitive dependencies). In the end we created some scripts around pip freeze to generate two requirements.txt files: one to pin direct dependencies, one to pin transitive dependencies.
  • This did not leave any room to distinguish development dependencies from runtime dependencies, so we either installed the development dependencies in production, or asked developers to pip install the remaining dependencies manually.
  • After somebody changes requirements.txt, all other developers need to run pip install, but how do they know? “Did you run pip install?” used to be our most common troubleshooting response in the development chatroom.
  • Installing packages with APT is not atomic: APTs updates files in-place one by one. Because we start Python processes many times per second, if one started during a deployment, it could import a mix of the old and new code. This regularly lead to puzzling bugs.
  • Installing the APT package could take minutes, because aside from the code it contained several large resources. These had to be packaged, downloaded, and extracted for every release, even though these files changed infrequently.

We briefly explored Pipenv in some of our smaller projects, but quickly discovered that it does not solve any of these problems, and it creates some problems of its own, so we moved on.

Caching expensive build steps

This second problem shows how Nix can be used as a universal caching system.

Products loaded in the Channable tool can be exported to a wide range of marketplaces. Each marketplace has their own set of categories to categorize products in. To support this, we periodically import data models from marketplaces, and save them in files in a standardized text-based format. These files are easy to generate and review, but not efficient to use at runtime, so we convert them into SQLite databases at build time. Generating these databases for all marketplaces we support takes about 30 minutes. This impacted our CI times and the local developer experience.

We tried to be smart about re-generating databases only when the inputs changed, but there were always subtle cases that lead to incorrect results, and every developer still had to generate all artifacts to get a local development environment running. Another pain point was switching Git branches: if you switch to a branch that has a change in one of the inputs, you need to re-generate the corresponding database … but once you switch back, you need to do that again!

We would therefore like some additional features from our caching system:

  • Keep multiple versions of the category databases around, useful when switching between branches
  • Share category databases between systems, so they only have to be built once
  • Build category databases on CI, so no developer has to wait for the database to be built

Instead of making our own caching infrastructure more advanced we saw this as an opportunity to try and use Nix to build the category databases. The Nix store would automatically keep multiple versions of the category databases available. We could easily share the databases between machines when we setup a cache for our build outputs. Building category databases on CI and pushing them to the cache became a possibility.

Breaking changes in system dependencies

Another problem that we ran into when relying on the system packaging workflow using APT were backwards-incompatible changes in system-packaged libraries. For example, libicu has incompatible sonames across different Ubuntu versions. This meant that for upgrading the Ubuntu version on our servers, we had to separately build and package our application for each Ubuntu version that we intended to use it with. That also meant we would sometimes necessarily develop against different versions of third-party libraries on our own machines than what we ran in production.

What is Nix?

Much like Haskell, Nix is based on a number of revolutionary ideas, that, when put together, enhance the state-of-the-art of how we build our software. Unlike Haskell, it is much less polished and comes with plenty of sharp edges. Although almost 20 years old, the project is currently still evolving at a rapid (and accelerating!) pace, and seems to have reached escape velocity. Yet, at its core it has also reached a level of maturity and stability that it can be recommended for production use.

To fully understand the advantages that Nix offers, it is necessary to have a cursory understanding of the different projects in the Nix ecosystem. We will give a very short overview of the different parts here.

We like to think of Nix as the ultimate DevOps toolkit. It comes with a number of power tools:

  1. The Nix programming language — The core idea of Nix is that build instructions and dependency management are best done with a fully fledged declarative programming language.
  2. The Nix CLI tools — Programs that evaluate Nix programs to build or install packages, debug Nix code, and manage installed packages.
  3. Nixpkgs, the Nix package archive — One of the biggest package repositories in the world, in the shape of a git repository containing Nix code. Also contains the standard library for the Nix programming language.
  4. NixOS — A full-blown Linux distribution, built and managed entirely through Nix. We will ignore it in this article, since we are not using it at the time.

Nix the language allows us to define packages for our services. For every package we declare:

  • Any other packages it depends on, either our own, or packages from Nixpkgs. These can be runtime dependencies or build-time dependencies.
  • The source code of our application.
  • The commands to run to turn the source code into a binary or other build output.

Nix stores the build outputs in a directory derived from the package declaration and its inputs. This means that if any of the inputs change, the output will be stored in a different path, which allows multiple build results to co-exist.

Furthermore, Nix can compute the path of the output directory before it starts building anything. Binary caches can cache these directories. If a path exists in a binary cache, then Nix can download it from there instead of running the command locally to produce the build output.

When using Nixpkgs, it is possible to pin a specific revision of the Nixpkgs revision. In a sense, this acts like a global lockfile for the entire software ecosystem. Usually Nixpkgs provides a single version of a package, but in complex situations a single Nixpkgs revision can provide multiple versions of a package in parallel, like Python 3.8 and 3.9. The packages from Nixpkgs are all available from the default Nix binary cache.

Why Nix

As anybody who ever had to generate a 10,000 line makefile, or template an enormous sea of YAML, will attest to, having a full-blown programming language to specify your builds is a honking great idea. Having fully declarative builds is a nice ideal, but it is a chimera. Any sufficiently complex build will at some point require some build steps that can no longer be declared declaratively. This is the point at which developers will usually resort to hacks like shelling out to bash scripts, to circumvent the limitations of the build system.

Furthermore, having a single open-source mono-repo for all external packages comes with a number of major advantages. It makes it very easy to pull in any external dependencies, not just the ones from your programming language. Say, for example, you want to have Postgres and Redis available on CI to run some integration tests. With most other systems, you would have to set this up yourself with some bash and duct-tape. With Nix you get one unified build environment in which you can set up anything you need (and pin it to the right version, so that your test still passes in the future).

Finally, having a mono-repo for all packages is also a kind of social technology that enables contributions from everywhere, while providing a central place for testing and quality control. nixpkgs contains more than 55000 projects, with only Arch's AUR having more projects available.

How Nix solves our problems

Let’s go over our packaging and development problems once more, and see how Nix solves these.

  • Updating single dependencies. With Nix we pin the Nixpkgs version, which in turn pins the packages we use from there, but on top of that, Nix makes it easy to override packages to just the version we need, so for packages that we depend on directly, we do this to have more control.

  • Separarating runtime and development dependencies. With pkgs.buildEnv, it is easy to build a development environment that includes all runtime dependencies plus development packages, and in production we don’t reference this environment.

  • Avoiding manual installation steps. With Nix, we either prefix the commands we run with nix run -c, or we enter a development shell with nix run -c $SHELL. Nix then extends the PATH to make the packages we need available. Nix builds default.nix when it runs, so it always uses the latest version. If default.nix changes due to a git pull, the next nix run will pull in the new dependencies.

  • Atomic installation. Nix installs packages at a path that depends on all of the build inputs, so a new version of a package can be installed alongside the old version. We first ensure that all the required files are present, and then we can atomically replace one systemd unit file, or symlink in /usr/bin, and make it point to the new path.

  • Instant rollbacks. Because the old version does not get removed when we install a new version of a package, rolling back is as simple as making the symlink or systemd unit point to the previous path again. Old versions do get garbage collected at some point, but we explicitly keep the past 3 versions around.

  • Sharing large resources between versions. By packaging large resources as separate Nix packages, they can be shared between multiple versions of our services. Because multiple versions of the resources can be installed alongside each other, our services reference the exact version they need, and atomic deploys still work, which would not be possible with APT packages.

  • Avoiding rebuilds when switching branches. Nix puts build outputs in a subdirectory of /nix/store, in a path that depends on the build description. If the path already exists, because it was built previously, there is no need to build it again, so a build after switching back and forth between branches is a no-op.

  • Sharing build outputs between machines. Because Nix computes the path where it is going to store a build output before building it, it can also query a binary cache to get the output from there, instead of building it. Cachix3 makes it easy to set up such a cache for private use. After a first build we push to this cache, and then nobody else needs to build the output again.

  • Reusing build outputs from CI. The binary cache can be filled by any trusted user, which can be CI. So now we build every target only once, and developers, but also other CI runs, can reuse the output from the cache.

  • Breaking changes in system dependencies. Packages built with Nix depend only on other packages managed by Nix, not on system libraries, so our packages are now independent of the host OS; we can deploy the same package to Ubuntu 18.04 and 20.04.

How we migrated to Nix

Whenever a large project gets settled on one way of doing things, it starts to become comfortable. So much so that the peculiarities of this one way of doing things become interwoven into the very nature of the project. In our case, all the code, workflows and systems were built on top of Haskell's stack or Python's virtualenv, Ansible, APT packages and servers running Ubuntu. Nix is a different approach to what all these tools provide, one that at first glance seems incompatible with the existing tools. The first challenge was finding out how we could start using Nix in an otherwise established workflow.

Fortunately, Nix can take over parts of the above mentioned workflow. In a first step for a Haskell project, we replaced the code in a Python script that called stack for compiling, and dpkg-deb for creating an APT package, with some Nix code that compiled the Haskell project and put it in an APT package instead. This change was invisible to the rest of the packaging and deployment process, as the output was still an APT package. Besides, building the APT package with Nix already gave us build reproducibility. After swapping out more components of the build/deploy process, we found that each individual step gave us more of the benefits that Nix offers, despite the rest of the process still being done the old way. This boosted our motivation to continue.

Swapping out one piece at a time gives relative peace of mind, except for the pieces that run on production. With Channable, an outage is very likely to frustrate customers. More importantly, it is even more likely to cause intense stress for the developers involved. We don't like stress, so we made sure to make the switch as safe and well tested as is reasonably possible. This was done by first switching over staging servers. For services that are load balanced on several servers, we could switch one server at a time. Installing both the APT version and Nix version simultaneously was also possible for some packages. In the end most of the packages were switched over without problems. The damage was very much limited for the ones that did cause trouble.

With a bit of creativity, one can continue swapping out bits and pieces, slowly building a sense of comfort for the new way. It is important that this new sense of comfort is also built among colleagues working on different parts of the system. To prevent surprises, they will need to be aware of the most important changes, why they are happening and how to work with them. To achieve this, we gave presentations to the development team, introducing them to Nix. Later this was followed by an in-depth workshop. Every direct change to the workflow was accompanied by a development announcement mail to explain the change. We have found that enthusiasm grew amongst our colleagues, and that many of them have become enthusiasts themselves.

Conclusion

At Channable we use Nix to build and deploy our services and to manage our development environments. Over the course of more than a year, we gradually adopted Nix, replacing language-specific package managers (for Python), or using Nix to pin them (for Haskell), so we have a uniform way to manage development environments across repositories, OS versions, and execution environments. We use Nix to package and distribute our services to our production servers, which brought us fast and atomic deploys, and instant rollbacks. Having a domain-specific programming language at our disposal, and a package collection maintained by a vibrant community, enabled us to do this.


With special thanks to Laurens Duijvesteijn, Wesley Bowman, Sayyed Naqi, the entire DevOps team, and everybody else who joined in this effort.
Discuss this post on Reddit or Hacker News.


1: As anyone who has ever written a build system will attest to, it is easy to get started, but gets complex very quickly. Dependency-handling? Sure, let's add a DAG execution system. Isolation? Of course, let's use chroot or Linux namespaces. Caching? Let's just hash all inputs and put that in a GCS bucket. Incremental builds? Let's only recompute the build steps that changed. etc.

2: In principle, this is of course true of all build systems. But Nix makes it much easier to keep all build inputs around for the long term, by having all build instructions for all external packages in a giant mono-repo. The important part in Nix is to pin the exact version of nixpkgs that you depend on, and additionally to take care that all build inputs outside of nixpkgs are part of your own repo.

3: Shout out to the amazing service of Cachix, their service has saved us a lot of trouble, and the few times something went wrong, it was often already fixed by the time we noticed.

Ruud van AsseldonkDeveloper
Robert KreuzerCo-founder & CTO
Reinier MaasDeveloper
Falco PeijnenburgDeveloper
Fabian ThorandDevelopment Team Lead