Change: rewrite wording to be scarier

author: Lars Wirzenius <liw@liw.fi> 2020-04-06 13:25:13 +0300
committer: Lars Wirzenius <liw@liw.fi> 2020-04-06 13:25:13 +0300
commit: 7075f18cfa3d3c9db8dd437c8486e89114100d4f (patch)
tree: 0bbdff602a6137ab701ab48ae4b222be27480157 /contractor.md
parent: 969948eb422efccfdd78dbbea9d8e57632aaf3a5 (diff)
download: ick-contractor-7075f18cfa3d3c9db8dd437c8486e89114100d4f.tar.gz
1 files changed, 176 insertions, 100 deletions
diff --git a/contractor.md b/contractor.md
index f331a7f..b72f764 100644
--- a/contractor.md
+++ b/contractor.md
@@ -1,56 +1,92 @@
----
-title: "Contractor: running CI builds securely"
-author: "Lars Wirzenius"
-bindings: contractor.yaml
-functions: contractor.py
-...
-
-
-# Status of this document
-
-This document is in its very early stages, as is the whole Contractor
-project.
-
-## Open questions
-
-* How should the command line tool communicate with the manager
-  processes in the outer VM?
-
-  For now, I'll assume the outer VM is running under libvirt and
-  communication with the manager is via ssh. This will probably need
-  to change later, but it gets me started.
+<!-- meta data block is at the end of the file, because Emacs gets -->
+<!-- less confused that way -->
 
 # Introduction
 
-A continuous integration engine (CI) takes the source code for a
-software project and ensures it works. In less abstract terms, it
-builds it, and runs any automated tests it may have. The exact steps
-for that depend heavily on the CI engine and the project, but can be
-thought of as follows (with concrete examples of possible commands):
-
-* retrieve the desired revision of the source code (git clone, git
-  checkout)
-* install build dependencies (dpkg-checkbuilddeps, apt install)
-* build (./configure, make)
-* test (make check)
-
-This is dangerous, risky stuff. In the specific case of an open,
-hosted CI service, it's especially dangerous: anyone can submit any
-build, and that build can do anything, including attack computers
-anywhere on the Internet. However, even in a CI engine that only
-builds projects for in-house developers, it's risky: most attacks on
-IT are done by insiders.
-
-Apart from actual attacks, building software is dangerous also due to
-accidents: a mistake in the way software is built, or automatically
-tested, can result in what looks and behaves like an attack. An
-infinite loop can use excessive amounts of CPU resources, or block
-other projects from getting built.
+Software development is a securit risk.
+
+Building software from source code and running it is a core activity
+of software development. Software developers do it on the machine they
+work on. Continuous integration engines do it on server. These are
+fundamentally the same. The process is roughly as follows:
+
+* install any dependencies
+* build the software
+* run the software, perhaps as part of unit testing
+
+When the software is run, even if only a small unit of it, it can do
+anything that the person running the build can do, in principle:
+
+* delete files
+* modify files
+* log into remote hosts using SSH
+* decrypt or sign files with PGP
+* send email
+* delete email
+* commit to version control repositories
+* do anything with the browser that a the person could do
+* run things as sudo
+* in general, cause mayhem and chaos
+
+Normally, a software developer can assume that the code they wrote
+themselves doesn't do any of that. They can even assume that people
+they work with don't do any of that. In both cases, they may be wrong:
+mistakes happen. It's a well-guarded secret among programmers that
+they, sometimes, even if rarely, make catastrophic mistakes.
+
+**FIXME**: reference the bug in Debian that removed the ld.so symlink
+
+Accidents aside, mayhem and chaos may be intentional. Your own project
+may not have malware, and you may have vetted all your dependencies,
+and you trust them. But your dependencies have dependencies, which
+have further dependencies. You'd need to vet the whole dependency
+tree. Even decades ago, in the 1990s, this could easily be hundreds of
+thousands of lines of code, and modern systems make it worse. Note
+that build tools are themselves dependencies, as is the whole
+operating system.
+
+How certain are you that you can spot malicious code that's
+intentionally hidden and obfuscated?
+
+Are you prepared to vet any changes to any transitive dependencies?
+
+Does this really matter? Maybe it doesn't. If you can't ever do
+anything on your computer that would affect you or anyone else in a
+negative way, it probably doesn't matter. Most software developers are
+not in that position.
+
+This risk affects every operating system and every programming
+language. The degree in which it exists varies, a lot. Some
+programming language ecosystems seem more vulnerable than others: the
+nodejs/npm one, for example, values tiny and highly focused packages,
+which leads to immense dependency trees. The direct or indirect
+dependencies there are, the higher the chance that one of them turns
+out to be bad.
+
+The risk also exists for more traditional languages, such as C. Few C
+programs have no dependencies. They all need a C compiler and an
+operating system, at least.
+
+The risk is there for both free software systems, and non-free ones.
+As an example, the Debian system is entirely free software, but it's
+huge: the Debian 10 (buster) release has over 50 thousand packages,
+maintained by about 2000 people. While it's probable that none of
+those packages contains actual malware, it's not certain. Even if
+everyone who helps maintain is completely trustworthy, the amount of
+software in Debian is much too large for all code to be
+comprehensively reviewed.
+
+This is true for all operating systems that are not mere toys.
+
+The conclusion here is that to build software securely, we can't
+assume all code involved in the build to be secure. We need something
+stronger.
 
 ## Threat model
 
 This section collects a list of specific threats to consider.
 
+* accessing or modifying files not part of the build
 * excessive use build host resources
   * e.g., CPU, GPU, RAM, disk, etc
   * this might happen to make unauthorized use of the resources, or to
@@ -69,35 +105,22 @@ This section collects a list of specific threats to consider.
 # Requirements
 
 This chapter discusses the requirements for the Contractor solution.
-The requirements are divided into two sections: one that presents a
+The requirements are divided into two parts: one that's based on the
 threat model, and another for requirements that aren't about security.
 
-## Non-security requirements
-
-* **AnyBuildOS**: Builds should be able to run in any operating system
-  that can be run as a virtual machine guest of the host operating
-  system. The host is likely to be Linux, using Qemu and KVM for
-  virtualization.
-
-* **NoRoot**: Running the Contractor should not require root
-  privileges. It's OK to require sufficient privileges to use
-  virtualisation.
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in [RFC 2119][].
 
-* **DefaultBuilder**: The Contractor should be easy to set up and to
-  use. It should not require extensive configuration. Running a build
-  should be as easy as running **make**(1) on the commadnd line. It
-  should be feasible to expect developers to use the Contractor for
-  their normal development work.
+[RFC 2119]: https://tools.ietf.org/html/rfc2119
 
 ## Security requirements
 
-[RFC 2119]: https://tools.ietf.org/html/rfc2119
-
 These requirements stem from the threat model above. 
 
-The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
-"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
-document are to be interpreted as described in [RFC 2119][].
+* **FilesystemIsolation**: The Contractor MUST prevent the build from
+  accessing or modifying any files outside the build. Build tools and
+  libraries outside the source tree MUST be usable.
 
 * The Contractor MUST prevent the build from using more than the
   user-specified amount of CPU time (**HardCPULimit**), disk space
@@ -121,6 +144,23 @@ document are to be interpreted as described in [RFC 2119][].
   build. This includes vulnerabilities of the host's operating system
   kernel, virtualisation solution, and hardware.
 
+## Non-security requirements
+
+* **AnyBuildOS**: Builds should be able to run in any operating system
+  that can be run as a virtual machine guest of the host operating
+  system. The host is likely to be Linux, using Qemu and KVM for
+  virtualization.
+
+* **NoRoot**: Running the Contractor should not require root
+  privileges. It's OK to require sufficient privileges to use
+  virtualisation.
+
+* **DefaultBuilder**: The Contractor should be easy to set up and to
+  use. It should not require extensive configuration. Running a build
+  should be as easy as running **make**(1) on the commadnd line. It
+  should be feasible to expect developers to use the Contractor for
+  their normal development work.
+
 
 # Architecture
 
@@ -128,10 +168,11 @@ This chapter discusses the architecture of the solution, with
 particular emphasis on threat mitigation.
 
 The overall solution is use of nested virtual machines, running on a
-developer's host. The outer VM runs the Contractor itself. The inner
-VM runs the build. The outer VM controls the inner VM, proxies its
+developer's host. The outer VM runs the Contractor itself, and is
+called "the manager VM". The inner VM runs the build, and is called
+the "worker VM". The manager VM controls the worker VM, proxies its
 external access, and prevents it from doing anything nefarious. The
-outer VM is managed by a command line tool. Developers only interact
+manager VM is managed by a command line tool. Developers only interact
 directly with the command line tool.
 
 ~~~dot
@@ -147,10 +188,10 @@ digraph "arch" {
     contractor [label="Contractor CLI"];
     artifacts [shape=tab label="Artifact store \n (directory)"];
     subgraph cluster_contractor {
-      label="Contractor VM \n (defence force)";
+      label="Manager VM \n (defence force)";
       manager;
       subgraph cluster_builder {
-        label="Build VM \n (here be dragons)";
+        label="Worker VM \n (here be dragons)";
         style=filled;
         fillcolor="#dd0000";
         guestos [label="Guest OS"];
@@ -173,18 +214,18 @@ This high-level design is chosen for the following reasons:
 * it allows the build to happen in any operating system
   (**AnyBuildOS**)
 * the Contractor is a VM and running it doesn't require root
-  privileges; it has root inside the VM if needed; all the complexity
-  of setting things up so the builder VM works correctly are contained
-  the outer VM, and the user need do only minimal configuration
-  (**NoRoot**)
+  privileges; it has root inside both VMs if needed; all the
+  complexity of setting things up so the worker VM works correctly are
+  contained the manager VM, and the user need do only minimal
+  configuration (**NoRoot**)
 * the command line tool for using the Contractor can be made to be as
   easy as any build tool so that developer actually use it by default
   (**DefaultBuilder**)
-* the manager in the outer VM can monitor and control the build
-  (**HardCPULimit**, **HardBandwidthLimit**)
-* the manager can supply the inner VM with only a specified amount of
+* the manager VM can monitor and control the build (**HardCPULimit**,
+  **HardBandwidthLimit**)
+* the manager can supply the worker VM with only a specified amount of
   RAM and disk space (**HardRAMLimit**, **HardDiskLimit**)
-* the manager can set up routing and firewalls so that the inner VM
+* the manager can set up routing and firewalls so that the worker VM
   cannot access the network, except via proxies provided by the outer
   VM (**ConstrainNetworkAccess**)
 * the nested VMs provide a smaller attack surface than the Linux
@@ -194,40 +235,42 @@ This high-level design is chosen for the following reasons:
 
 ## Build process
 
-The architecture leads to a build process that would work like this:
+The architecture leads to a build process that would work roughly like
+this:
 
 * developer runs command line tool to do a build
-* command line tool boots the outer VM, which starts any services and
-  proxies running in the outer VM, and configures networking and
+* command line tool boots the manager VM, which starts any services
+  and proxies running in the manager VM, and configures networking and
   firewalls
 * command line tool copies the source code and build recipe into the
-  outer VM
-* outer VM retrieves the system image for the inner VM
-* outer VM boots inner VM
-* outer VM provides source code to inner VM
-* outer VM instructs inner VM to perform each build step in the build
-  recipe, while monitoring network access and CPU use; if the outer VM
+  manager VM
+* manager VM retrieves the system image for the worker VM
+* manager VM boots worker VM
+* manager VM provides source code to worker VM
+* manager VM instructs worker VM to perform each build step in the build
+  recipe, while monitoring network access and CPU use; if the manager VM
   notices any limits being exceeded, or attempts to access network
   resources other than ones allowed by developer, it will stop the
-  inner VM, and report failur to the developer
-* outer VM will retrieve build artifacts from the inner VM and put
+  worker VM, and report failure to the developer
+* manager VM will retrieve build artifacts from the guest VM and put
   them in an artifact directory so the developer can access them
 * command line tool reports to the developer build success or failure
   and where build log and build artifacts are
 
 ## Implementation sketch
 
-The outer VM runs Debian stable, and has libvirt to run guest VMs. The
-host and the outer VM are configured to support nested VMs, if the
-host hardware supports it. The outer VM has its networking configured
-so that it can connect to hosts outside itself, but the inner VM can
-only connect to services inside the outer VM. The services provided to
-the inner VM are an artifact store, and an HTTP proxy. (That's the
-only protocol I know of right now; more can be added if need be.)
+The manager VM runs Debian stable, and has libvirt to run guest VMs.
+The host and the manager VM are configured to support nested VMs, if
+the host hardware supports it. The manager VM has its networking
+configured so that it can connect to hosts outside itself, but the
+worker VM can only connect to services provided by the manager VM. The
+services provided to the worker VM are an artifact store, and an HTTP
+proxy. (That's the only protocol I know of right now; more can be
+added if need be.)
 
-The artifact store is mounted to the outer VM using 9p. A web service
-in the outer VM serves the files to the inner VM. The developer can
-access the artifact store via their local file system.
+The artifact store is mounted to the manager VM using 9p. A web
+service in the manager VM serves the files to the worker VM. The
+developer can access the artifact store via their local file system.
 
 # Acceptance criteria
 
@@ -355,3 +398,36 @@ language specific package management, or more.
 ### Can build an Ubuntu guest VM image (FIXME)
 
 ### Can build a FreeBSD guest VM image (FIXME)
+
+
+
+---
+title: "Contractor: build software securely"
+author: "Lars Wirzenius"
+bindings: contractor.yaml
+functions: contractor.py
+documentclass: report
+abstract: |
+  Building software typically requires running code downloaded from
+  the Internet. Even when you're building your own software, you
+  usually depend on libraries and tools, which in turn may depend on
+  further things. It is becoming infeasible to vet the whole set of
+  software running during a build. If a build includes running local
+  tests (unit tests, some integration tests), the problem gets worse
+  in magintude, if not quality.
+
+  Some software ecosystems are especially vulnerable to this (nodejs,
+  Python, Ruby, Go, Rust), but it's true for anything that has
+  dependencies on any code from outside its own code base, and even if
+  all the dependencies come from a trusted source, such as the
+  operating system vendor.
+
+  The Contractor is an attempt to be able to build software securely,
+  by leveraging virtual machine technology. It attempts to be
+  secure, convenient, and reasonably efficient.
+
+  The Contractor is not a replacement for a Continuous Integration
+  engine, and its technology will hopefully one day become part of the
+  Ick CI engine.
+
+...
author	Lars Wirzenius <liw@liw.fi>	2020-04-06 13:25:13 +0300
committer	Lars Wirzenius <liw@liw.fi>	2020-04-06 13:25:13 +0300
commit	7075f18cfa3d3c9db8dd437c8486e89114100d4f (patch)
tree	0bbdff602a6137ab701ab48ae4b222be27480157 /contractor.md
parent	969948eb422efccfdd78dbbea9d8e57632aaf3a5 (diff)
download	ick-contractor-7075f18cfa3d3c9db8dd437c8486e89114100d4f.tar.gz