diff options
author | Lars Wirzenius <liw@liw.fi> | 2020-04-06 13:25:13 +0300 |
---|---|---|
committer | Lars Wirzenius <liw@liw.fi> | 2020-04-06 13:25:13 +0300 |
commit | 7075f18cfa3d3c9db8dd437c8486e89114100d4f (patch) | |
tree | 0bbdff602a6137ab701ab48ae4b222be27480157 /contractor.md | |
parent | 969948eb422efccfdd78dbbea9d8e57632aaf3a5 (diff) | |
download | ick-contractor-7075f18cfa3d3c9db8dd437c8486e89114100d4f.tar.gz |
Change: rewrite wording to be scarier
Diffstat (limited to 'contractor.md')
-rw-r--r-- | contractor.md | 276 |
1 files changed, 176 insertions, 100 deletions
diff --git a/contractor.md b/contractor.md index f331a7f..b72f764 100644 --- a/contractor.md +++ b/contractor.md @@ -1,56 +1,92 @@ ---- -title: "Contractor: running CI builds securely" -author: "Lars Wirzenius" -bindings: contractor.yaml -functions: contractor.py -... - - -# Status of this document - -This document is in its very early stages, as is the whole Contractor -project. - -## Open questions - -* How should the command line tool communicate with the manager - processes in the outer VM? - - For now, I'll assume the outer VM is running under libvirt and - communication with the manager is via ssh. This will probably need - to change later, but it gets me started. +<!-- meta data block is at the end of the file, because Emacs gets --> +<!-- less confused that way --> # Introduction -A continuous integration engine (CI) takes the source code for a -software project and ensures it works. In less abstract terms, it -builds it, and runs any automated tests it may have. The exact steps -for that depend heavily on the CI engine and the project, but can be -thought of as follows (with concrete examples of possible commands): - -* retrieve the desired revision of the source code (git clone, git - checkout) -* install build dependencies (dpkg-checkbuilddeps, apt install) -* build (./configure, make) -* test (make check) - -This is dangerous, risky stuff. In the specific case of an open, -hosted CI service, it's especially dangerous: anyone can submit any -build, and that build can do anything, including attack computers -anywhere on the Internet. However, even in a CI engine that only -builds projects for in-house developers, it's risky: most attacks on -IT are done by insiders. - -Apart from actual attacks, building software is dangerous also due to -accidents: a mistake in the way software is built, or automatically -tested, can result in what looks and behaves like an attack. An -infinite loop can use excessive amounts of CPU resources, or block -other projects from getting built. +Software development is a securit risk. + +Building software from source code and running it is a core activity +of software development. Software developers do it on the machine they +work on. Continuous integration engines do it on server. These are +fundamentally the same. The process is roughly as follows: + +* install any dependencies +* build the software +* run the software, perhaps as part of unit testing + +When the software is run, even if only a small unit of it, it can do +anything that the person running the build can do, in principle: + +* delete files +* modify files +* log into remote hosts using SSH +* decrypt or sign files with PGP +* send email +* delete email +* commit to version control repositories +* do anything with the browser that a the person could do +* run things as sudo +* in general, cause mayhem and chaos + +Normally, a software developer can assume that the code they wrote +themselves doesn't do any of that. They can even assume that people +they work with don't do any of that. In both cases, they may be wrong: +mistakes happen. It's a well-guarded secret among programmers that +they, sometimes, even if rarely, make catastrophic mistakes. + +**FIXME**: reference the bug in Debian that removed the ld.so symlink + +Accidents aside, mayhem and chaos may be intentional. Your own project +may not have malware, and you may have vetted all your dependencies, +and you trust them. But your dependencies have dependencies, which +have further dependencies. You'd need to vet the whole dependency +tree. Even decades ago, in the 1990s, this could easily be hundreds of +thousands of lines of code, and modern systems make it worse. Note +that build tools are themselves dependencies, as is the whole +operating system. + +How certain are you that you can spot malicious code that's +intentionally hidden and obfuscated? + +Are you prepared to vet any changes to any transitive dependencies? + +Does this really matter? Maybe it doesn't. If you can't ever do +anything on your computer that would affect you or anyone else in a +negative way, it probably doesn't matter. Most software developers are +not in that position. + +This risk affects every operating system and every programming +language. The degree in which it exists varies, a lot. Some +programming language ecosystems seem more vulnerable than others: the +nodejs/npm one, for example, values tiny and highly focused packages, +which leads to immense dependency trees. The direct or indirect +dependencies there are, the higher the chance that one of them turns +out to be bad. + +The risk also exists for more traditional languages, such as C. Few C +programs have no dependencies. They all need a C compiler and an +operating system, at least. + +The risk is there for both free software systems, and non-free ones. +As an example, the Debian system is entirely free software, but it's +huge: the Debian 10 (buster) release has over 50 thousand packages, +maintained by about 2000 people. While it's probable that none of +those packages contains actual malware, it's not certain. Even if +everyone who helps maintain is completely trustworthy, the amount of +software in Debian is much too large for all code to be +comprehensively reviewed. + +This is true for all operating systems that are not mere toys. + +The conclusion here is that to build software securely, we can't +assume all code involved in the build to be secure. We need something +stronger. ## Threat model This section collects a list of specific threats to consider. +* accessing or modifying files not part of the build * excessive use build host resources * e.g., CPU, GPU, RAM, disk, etc * this might happen to make unauthorized use of the resources, or to @@ -69,35 +105,22 @@ This section collects a list of specific threats to consider. # Requirements This chapter discusses the requirements for the Contractor solution. -The requirements are divided into two sections: one that presents a +The requirements are divided into two parts: one that's based on the threat model, and another for requirements that aren't about security. -## Non-security requirements - -* **AnyBuildOS**: Builds should be able to run in any operating system - that can be run as a virtual machine guest of the host operating - system. The host is likely to be Linux, using Qemu and KVM for - virtualization. - -* **NoRoot**: Running the Contractor should not require root - privileges. It's OK to require sufficient privileges to use - virtualisation. +The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", +"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this +document are to be interpreted as described in [RFC 2119][]. -* **DefaultBuilder**: The Contractor should be easy to set up and to - use. It should not require extensive configuration. Running a build - should be as easy as running **make**(1) on the commadnd line. It - should be feasible to expect developers to use the Contractor for - their normal development work. +[RFC 2119]: https://tools.ietf.org/html/rfc2119 ## Security requirements -[RFC 2119]: https://tools.ietf.org/html/rfc2119 - These requirements stem from the threat model above. -The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", -"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this -document are to be interpreted as described in [RFC 2119][]. +* **FilesystemIsolation**: The Contractor MUST prevent the build from + accessing or modifying any files outside the build. Build tools and + libraries outside the source tree MUST be usable. * The Contractor MUST prevent the build from using more than the user-specified amount of CPU time (**HardCPULimit**), disk space @@ -121,6 +144,23 @@ document are to be interpreted as described in [RFC 2119][]. build. This includes vulnerabilities of the host's operating system kernel, virtualisation solution, and hardware. +## Non-security requirements + +* **AnyBuildOS**: Builds should be able to run in any operating system + that can be run as a virtual machine guest of the host operating + system. The host is likely to be Linux, using Qemu and KVM for + virtualization. + +* **NoRoot**: Running the Contractor should not require root + privileges. It's OK to require sufficient privileges to use + virtualisation. + +* **DefaultBuilder**: The Contractor should be easy to set up and to + use. It should not require extensive configuration. Running a build + should be as easy as running **make**(1) on the commadnd line. It + should be feasible to expect developers to use the Contractor for + their normal development work. + # Architecture @@ -128,10 +168,11 @@ This chapter discusses the architecture of the solution, with particular emphasis on threat mitigation. The overall solution is use of nested virtual machines, running on a -developer's host. The outer VM runs the Contractor itself. The inner -VM runs the build. The outer VM controls the inner VM, proxies its +developer's host. The outer VM runs the Contractor itself, and is +called "the manager VM". The inner VM runs the build, and is called +the "worker VM". The manager VM controls the worker VM, proxies its external access, and prevents it from doing anything nefarious. The -outer VM is managed by a command line tool. Developers only interact +manager VM is managed by a command line tool. Developers only interact directly with the command line tool. ~~~dot @@ -147,10 +188,10 @@ digraph "arch" { contractor [label="Contractor CLI"]; artifacts [shape=tab label="Artifact store \n (directory)"]; subgraph cluster_contractor { - label="Contractor VM \n (defence force)"; + label="Manager VM \n (defence force)"; manager; subgraph cluster_builder { - label="Build VM \n (here be dragons)"; + label="Worker VM \n (here be dragons)"; style=filled; fillcolor="#dd0000"; guestos [label="Guest OS"]; @@ -173,18 +214,18 @@ This high-level design is chosen for the following reasons: * it allows the build to happen in any operating system (**AnyBuildOS**) * the Contractor is a VM and running it doesn't require root - privileges; it has root inside the VM if needed; all the complexity - of setting things up so the builder VM works correctly are contained - the outer VM, and the user need do only minimal configuration - (**NoRoot**) + privileges; it has root inside both VMs if needed; all the + complexity of setting things up so the worker VM works correctly are + contained the manager VM, and the user need do only minimal + configuration (**NoRoot**) * the command line tool for using the Contractor can be made to be as easy as any build tool so that developer actually use it by default (**DefaultBuilder**) -* the manager in the outer VM can monitor and control the build - (**HardCPULimit**, **HardBandwidthLimit**) -* the manager can supply the inner VM with only a specified amount of +* the manager VM can monitor and control the build (**HardCPULimit**, + **HardBandwidthLimit**) +* the manager can supply the worker VM with only a specified amount of RAM and disk space (**HardRAMLimit**, **HardDiskLimit**) -* the manager can set up routing and firewalls so that the inner VM +* the manager can set up routing and firewalls so that the worker VM cannot access the network, except via proxies provided by the outer VM (**ConstrainNetworkAccess**) * the nested VMs provide a smaller attack surface than the Linux @@ -194,40 +235,42 @@ This high-level design is chosen for the following reasons: ## Build process -The architecture leads to a build process that would work like this: +The architecture leads to a build process that would work roughly like +this: * developer runs command line tool to do a build -* command line tool boots the outer VM, which starts any services and - proxies running in the outer VM, and configures networking and +* command line tool boots the manager VM, which starts any services + and proxies running in the manager VM, and configures networking and firewalls * command line tool copies the source code and build recipe into the - outer VM -* outer VM retrieves the system image for the inner VM -* outer VM boots inner VM -* outer VM provides source code to inner VM -* outer VM instructs inner VM to perform each build step in the build - recipe, while monitoring network access and CPU use; if the outer VM + manager VM +* manager VM retrieves the system image for the worker VM +* manager VM boots worker VM +* manager VM provides source code to worker VM +* manager VM instructs worker VM to perform each build step in the build + recipe, while monitoring network access and CPU use; if the manager VM notices any limits being exceeded, or attempts to access network resources other than ones allowed by developer, it will stop the - inner VM, and report failur to the developer -* outer VM will retrieve build artifacts from the inner VM and put + worker VM, and report failure to the developer +* manager VM will retrieve build artifacts from the guest VM and put them in an artifact directory so the developer can access them * command line tool reports to the developer build success or failure and where build log and build artifacts are ## Implementation sketch -The outer VM runs Debian stable, and has libvirt to run guest VMs. The -host and the outer VM are configured to support nested VMs, if the -host hardware supports it. The outer VM has its networking configured -so that it can connect to hosts outside itself, but the inner VM can -only connect to services inside the outer VM. The services provided to -the inner VM are an artifact store, and an HTTP proxy. (That's the -only protocol I know of right now; more can be added if need be.) +The manager VM runs Debian stable, and has libvirt to run guest VMs. +The host and the manager VM are configured to support nested VMs, if +the host hardware supports it. The manager VM has its networking +configured so that it can connect to hosts outside itself, but the +worker VM can only connect to services provided by the manager VM. The +services provided to the worker VM are an artifact store, and an HTTP +proxy. (That's the only protocol I know of right now; more can be +added if need be.) -The artifact store is mounted to the outer VM using 9p. A web service -in the outer VM serves the files to the inner VM. The developer can -access the artifact store via their local file system. +The artifact store is mounted to the manager VM using 9p. A web +service in the manager VM serves the files to the worker VM. The +developer can access the artifact store via their local file system. # Acceptance criteria @@ -355,3 +398,36 @@ language specific package management, or more. ### Can build an Ubuntu guest VM image (FIXME) ### Can build a FreeBSD guest VM image (FIXME) + + + +--- +title: "Contractor: build software securely" +author: "Lars Wirzenius" +bindings: contractor.yaml +functions: contractor.py +documentclass: report +abstract: | + Building software typically requires running code downloaded from + the Internet. Even when you're building your own software, you + usually depend on libraries and tools, which in turn may depend on + further things. It is becoming infeasible to vet the whole set of + software running during a build. If a build includes running local + tests (unit tests, some integration tests), the problem gets worse + in magintude, if not quality. + + Some software ecosystems are especially vulnerable to this (nodejs, + Python, Ruby, Go, Rust), but it's true for anything that has + dependencies on any code from outside its own code base, and even if + all the dependencies come from a trusted source, such as the + operating system vendor. + + The Contractor is an attempt to be able to build software securely, + by leveraging virtual machine technology. It attempts to be + secure, convenient, and reasonably efficient. + + The Contractor is not a replacement for a Continuous Integration + engine, and its technology will hopefully one day become part of the + Ick CI engine. + +... |