summaryrefslogtreecommitdiff
path: root/contractor.md
diff options
context:
space:
mode:
authorLars Wirzenius <liw@liw.fi>2020-04-06 13:25:13 +0300
committerLars Wirzenius <liw@liw.fi>2020-04-06 13:25:13 +0300
commit7075f18cfa3d3c9db8dd437c8486e89114100d4f (patch)
tree0bbdff602a6137ab701ab48ae4b222be27480157 /contractor.md
parent969948eb422efccfdd78dbbea9d8e57632aaf3a5 (diff)
downloadick-contractor-7075f18cfa3d3c9db8dd437c8486e89114100d4f.tar.gz
Change: rewrite wording to be scarier
Diffstat (limited to 'contractor.md')
-rw-r--r--contractor.md276
1 files changed, 176 insertions, 100 deletions
diff --git a/contractor.md b/contractor.md
index f331a7f..b72f764 100644
--- a/contractor.md
+++ b/contractor.md
@@ -1,56 +1,92 @@
----
-title: "Contractor: running CI builds securely"
-author: "Lars Wirzenius"
-bindings: contractor.yaml
-functions: contractor.py
-...
-
-
-# Status of this document
-
-This document is in its very early stages, as is the whole Contractor
-project.
-
-## Open questions
-
-* How should the command line tool communicate with the manager
- processes in the outer VM?
-
- For now, I'll assume the outer VM is running under libvirt and
- communication with the manager is via ssh. This will probably need
- to change later, but it gets me started.
+<!-- meta data block is at the end of the file, because Emacs gets -->
+<!-- less confused that way -->
# Introduction
-A continuous integration engine (CI) takes the source code for a
-software project and ensures it works. In less abstract terms, it
-builds it, and runs any automated tests it may have. The exact steps
-for that depend heavily on the CI engine and the project, but can be
-thought of as follows (with concrete examples of possible commands):
-
-* retrieve the desired revision of the source code (git clone, git
- checkout)
-* install build dependencies (dpkg-checkbuilddeps, apt install)
-* build (./configure, make)
-* test (make check)
-
-This is dangerous, risky stuff. In the specific case of an open,
-hosted CI service, it's especially dangerous: anyone can submit any
-build, and that build can do anything, including attack computers
-anywhere on the Internet. However, even in a CI engine that only
-builds projects for in-house developers, it's risky: most attacks on
-IT are done by insiders.
-
-Apart from actual attacks, building software is dangerous also due to
-accidents: a mistake in the way software is built, or automatically
-tested, can result in what looks and behaves like an attack. An
-infinite loop can use excessive amounts of CPU resources, or block
-other projects from getting built.
+Software development is a securit risk.
+
+Building software from source code and running it is a core activity
+of software development. Software developers do it on the machine they
+work on. Continuous integration engines do it on server. These are
+fundamentally the same. The process is roughly as follows:
+
+* install any dependencies
+* build the software
+* run the software, perhaps as part of unit testing
+
+When the software is run, even if only a small unit of it, it can do
+anything that the person running the build can do, in principle:
+
+* delete files
+* modify files
+* log into remote hosts using SSH
+* decrypt or sign files with PGP
+* send email
+* delete email
+* commit to version control repositories
+* do anything with the browser that a the person could do
+* run things as sudo
+* in general, cause mayhem and chaos
+
+Normally, a software developer can assume that the code they wrote
+themselves doesn't do any of that. They can even assume that people
+they work with don't do any of that. In both cases, they may be wrong:
+mistakes happen. It's a well-guarded secret among programmers that
+they, sometimes, even if rarely, make catastrophic mistakes.
+
+**FIXME**: reference the bug in Debian that removed the ld.so symlink
+
+Accidents aside, mayhem and chaos may be intentional. Your own project
+may not have malware, and you may have vetted all your dependencies,
+and you trust them. But your dependencies have dependencies, which
+have further dependencies. You'd need to vet the whole dependency
+tree. Even decades ago, in the 1990s, this could easily be hundreds of
+thousands of lines of code, and modern systems make it worse. Note
+that build tools are themselves dependencies, as is the whole
+operating system.
+
+How certain are you that you can spot malicious code that's
+intentionally hidden and obfuscated?
+
+Are you prepared to vet any changes to any transitive dependencies?
+
+Does this really matter? Maybe it doesn't. If you can't ever do
+anything on your computer that would affect you or anyone else in a
+negative way, it probably doesn't matter. Most software developers are
+not in that position.
+
+This risk affects every operating system and every programming
+language. The degree in which it exists varies, a lot. Some
+programming language ecosystems seem more vulnerable than others: the
+nodejs/npm one, for example, values tiny and highly focused packages,
+which leads to immense dependency trees. The direct or indirect
+dependencies there are, the higher the chance that one of them turns
+out to be bad.
+
+The risk also exists for more traditional languages, such as C. Few C
+programs have no dependencies. They all need a C compiler and an
+operating system, at least.
+
+The risk is there for both free software systems, and non-free ones.
+As an example, the Debian system is entirely free software, but it's
+huge: the Debian 10 (buster) release has over 50 thousand packages,
+maintained by about 2000 people. While it's probable that none of
+those packages contains actual malware, it's not certain. Even if
+everyone who helps maintain is completely trustworthy, the amount of
+software in Debian is much too large for all code to be
+comprehensively reviewed.
+
+This is true for all operating systems that are not mere toys.
+
+The conclusion here is that to build software securely, we can't
+assume all code involved in the build to be secure. We need something
+stronger.
## Threat model
This section collects a list of specific threats to consider.
+* accessing or modifying files not part of the build
* excessive use build host resources
* e.g., CPU, GPU, RAM, disk, etc
* this might happen to make unauthorized use of the resources, or to
@@ -69,35 +105,22 @@ This section collects a list of specific threats to consider.
# Requirements
This chapter discusses the requirements for the Contractor solution.
-The requirements are divided into two sections: one that presents a
+The requirements are divided into two parts: one that's based on the
threat model, and another for requirements that aren't about security.
-## Non-security requirements
-
-* **AnyBuildOS**: Builds should be able to run in any operating system
- that can be run as a virtual machine guest of the host operating
- system. The host is likely to be Linux, using Qemu and KVM for
- virtualization.
-
-* **NoRoot**: Running the Contractor should not require root
- privileges. It's OK to require sufficient privileges to use
- virtualisation.
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in [RFC 2119][].
-* **DefaultBuilder**: The Contractor should be easy to set up and to
- use. It should not require extensive configuration. Running a build
- should be as easy as running **make**(1) on the commadnd line. It
- should be feasible to expect developers to use the Contractor for
- their normal development work.
+[RFC 2119]: https://tools.ietf.org/html/rfc2119
## Security requirements
-[RFC 2119]: https://tools.ietf.org/html/rfc2119
-
These requirements stem from the threat model above.
-The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
-"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
-document are to be interpreted as described in [RFC 2119][].
+* **FilesystemIsolation**: The Contractor MUST prevent the build from
+ accessing or modifying any files outside the build. Build tools and
+ libraries outside the source tree MUST be usable.
* The Contractor MUST prevent the build from using more than the
user-specified amount of CPU time (**HardCPULimit**), disk space
@@ -121,6 +144,23 @@ document are to be interpreted as described in [RFC 2119][].
build. This includes vulnerabilities of the host's operating system
kernel, virtualisation solution, and hardware.
+## Non-security requirements
+
+* **AnyBuildOS**: Builds should be able to run in any operating system
+ that can be run as a virtual machine guest of the host operating
+ system. The host is likely to be Linux, using Qemu and KVM for
+ virtualization.
+
+* **NoRoot**: Running the Contractor should not require root
+ privileges. It's OK to require sufficient privileges to use
+ virtualisation.
+
+* **DefaultBuilder**: The Contractor should be easy to set up and to
+ use. It should not require extensive configuration. Running a build
+ should be as easy as running **make**(1) on the commadnd line. It
+ should be feasible to expect developers to use the Contractor for
+ their normal development work.
+
# Architecture
@@ -128,10 +168,11 @@ This chapter discusses the architecture of the solution, with
particular emphasis on threat mitigation.
The overall solution is use of nested virtual machines, running on a
-developer's host. The outer VM runs the Contractor itself. The inner
-VM runs the build. The outer VM controls the inner VM, proxies its
+developer's host. The outer VM runs the Contractor itself, and is
+called "the manager VM". The inner VM runs the build, and is called
+the "worker VM". The manager VM controls the worker VM, proxies its
external access, and prevents it from doing anything nefarious. The
-outer VM is managed by a command line tool. Developers only interact
+manager VM is managed by a command line tool. Developers only interact
directly with the command line tool.
~~~dot
@@ -147,10 +188,10 @@ digraph "arch" {
contractor [label="Contractor CLI"];
artifacts [shape=tab label="Artifact store \n (directory)"];
subgraph cluster_contractor {
- label="Contractor VM \n (defence force)";
+ label="Manager VM \n (defence force)";
manager;
subgraph cluster_builder {
- label="Build VM \n (here be dragons)";
+ label="Worker VM \n (here be dragons)";
style=filled;
fillcolor="#dd0000";
guestos [label="Guest OS"];
@@ -173,18 +214,18 @@ This high-level design is chosen for the following reasons:
* it allows the build to happen in any operating system
(**AnyBuildOS**)
* the Contractor is a VM and running it doesn't require root
- privileges; it has root inside the VM if needed; all the complexity
- of setting things up so the builder VM works correctly are contained
- the outer VM, and the user need do only minimal configuration
- (**NoRoot**)
+ privileges; it has root inside both VMs if needed; all the
+ complexity of setting things up so the worker VM works correctly are
+ contained the manager VM, and the user need do only minimal
+ configuration (**NoRoot**)
* the command line tool for using the Contractor can be made to be as
easy as any build tool so that developer actually use it by default
(**DefaultBuilder**)
-* the manager in the outer VM can monitor and control the build
- (**HardCPULimit**, **HardBandwidthLimit**)
-* the manager can supply the inner VM with only a specified amount of
+* the manager VM can monitor and control the build (**HardCPULimit**,
+ **HardBandwidthLimit**)
+* the manager can supply the worker VM with only a specified amount of
RAM and disk space (**HardRAMLimit**, **HardDiskLimit**)
-* the manager can set up routing and firewalls so that the inner VM
+* the manager can set up routing and firewalls so that the worker VM
cannot access the network, except via proxies provided by the outer
VM (**ConstrainNetworkAccess**)
* the nested VMs provide a smaller attack surface than the Linux
@@ -194,40 +235,42 @@ This high-level design is chosen for the following reasons:
## Build process
-The architecture leads to a build process that would work like this:
+The architecture leads to a build process that would work roughly like
+this:
* developer runs command line tool to do a build
-* command line tool boots the outer VM, which starts any services and
- proxies running in the outer VM, and configures networking and
+* command line tool boots the manager VM, which starts any services
+ and proxies running in the manager VM, and configures networking and
firewalls
* command line tool copies the source code and build recipe into the
- outer VM
-* outer VM retrieves the system image for the inner VM
-* outer VM boots inner VM
-* outer VM provides source code to inner VM
-* outer VM instructs inner VM to perform each build step in the build
- recipe, while monitoring network access and CPU use; if the outer VM
+ manager VM
+* manager VM retrieves the system image for the worker VM
+* manager VM boots worker VM
+* manager VM provides source code to worker VM
+* manager VM instructs worker VM to perform each build step in the build
+ recipe, while monitoring network access and CPU use; if the manager VM
notices any limits being exceeded, or attempts to access network
resources other than ones allowed by developer, it will stop the
- inner VM, and report failur to the developer
-* outer VM will retrieve build artifacts from the inner VM and put
+ worker VM, and report failure to the developer
+* manager VM will retrieve build artifacts from the guest VM and put
them in an artifact directory so the developer can access them
* command line tool reports to the developer build success or failure
and where build log and build artifacts are
## Implementation sketch
-The outer VM runs Debian stable, and has libvirt to run guest VMs. The
-host and the outer VM are configured to support nested VMs, if the
-host hardware supports it. The outer VM has its networking configured
-so that it can connect to hosts outside itself, but the inner VM can
-only connect to services inside the outer VM. The services provided to
-the inner VM are an artifact store, and an HTTP proxy. (That's the
-only protocol I know of right now; more can be added if need be.)
+The manager VM runs Debian stable, and has libvirt to run guest VMs.
+The host and the manager VM are configured to support nested VMs, if
+the host hardware supports it. The manager VM has its networking
+configured so that it can connect to hosts outside itself, but the
+worker VM can only connect to services provided by the manager VM. The
+services provided to the worker VM are an artifact store, and an HTTP
+proxy. (That's the only protocol I know of right now; more can be
+added if need be.)
-The artifact store is mounted to the outer VM using 9p. A web service
-in the outer VM serves the files to the inner VM. The developer can
-access the artifact store via their local file system.
+The artifact store is mounted to the manager VM using 9p. A web
+service in the manager VM serves the files to the worker VM. The
+developer can access the artifact store via their local file system.
# Acceptance criteria
@@ -355,3 +398,36 @@ language specific package management, or more.
### Can build an Ubuntu guest VM image (FIXME)
### Can build a FreeBSD guest VM image (FIXME)
+
+
+
+---
+title: "Contractor: build software securely"
+author: "Lars Wirzenius"
+bindings: contractor.yaml
+functions: contractor.py
+documentclass: report
+abstract: |
+ Building software typically requires running code downloaded from
+ the Internet. Even when you're building your own software, you
+ usually depend on libraries and tools, which in turn may depend on
+ further things. It is becoming infeasible to vet the whole set of
+ software running during a build. If a build includes running local
+ tests (unit tests, some integration tests), the problem gets worse
+ in magintude, if not quality.
+
+ Some software ecosystems are especially vulnerable to this (nodejs,
+ Python, Ruby, Go, Rust), but it's true for anything that has
+ dependencies on any code from outside its own code base, and even if
+ all the dependencies come from a trusted source, such as the
+ operating system vendor.
+
+ The Contractor is an attempt to be able to build software securely,
+ by leveraging virtual machine technology. It attempts to be
+ secure, convenient, and reasonably efficient.
+
+ The Contractor is not a replacement for a Continuous Integration
+ engine, and its technology will hopefully one day become part of the
+ Ick CI engine.
+
+...