summaryrefslogtreecommitdiff
path: root/contractor.md
diff options
context:
space:
mode:
Diffstat (limited to 'contractor.md')
-rw-r--r--contractor.md401
1 files changed, 401 insertions, 0 deletions
diff --git a/contractor.md b/contractor.md
new file mode 100644
index 0000000..c76dc62
--- /dev/null
+++ b/contractor.md
@@ -0,0 +1,401 @@
+<!-- meta data block is at the end of the file, because Emacs gets -->
+<!-- less confused that way -->
+
+# Introduction
+
+Software development is a security risk.
+
+Building software from source code and running it is a core activity
+of software development. Software developers do it on the machine they
+work on. Continuous integration systems do it on server. These are
+fundamentally the same. The process is roughly as follows:
+
+* install any dependencies
+* build the software
+* run the software to perform automated tests on it
+
+When the software is run, even if only a small unit of it, it can do
+anything that the person running the build can do. For example, it can
+do any and all of the following, unless constrained:
+
+* delete files
+* modify files
+* log into remote hosts using SSH
+* decrypt or sign files with PGP
+* send email
+* delete email
+* commit to version control repositories
+* do anything with the browser that a the person could do
+* run things as root using sudo
+* in general, cause mayhem and chaos
+
+Normally, a software developer can assume that the code they wrote
+themselves won't ever do any of that. They can even assume that people
+they work with make code that won't do any of that. In both cases,
+they may be wrong: mistakes happen. It's a well-guarded secret among
+programmers that they sometimes, even if rarely, make catastrophic
+mistakes.
+
+Accidents aside, mayhem and chaos may be intentional. Your own project
+may not have malware, and you may have vetted all your dependencies,
+and you trust them. But your dependencies have dependencies, which
+have further dependencies, which have dependencies of their own. You'd
+need to vet the whole dependency tree. Even decades ago, in the 1990s,
+this could easily be hundreds of thousands of lines of code, and
+modern systems a much larger. Note that build tools are themselves
+dependencies, as is the whole operating system. Any code that is invoked
+in the build process is a dependency.
+
+How certain are you that you can spot malicious code that's
+intentionally hidden and obfuscated?
+
+Are you prepared to vet any changes to any transitive dependencies?
+
+Does this really matter? Maybe it doesn't. If you can't ever do
+anything on your computer that would affect you or anyone else in a
+negative way, it probably doesn't matter. Most software developers are
+not in that position.
+
+This risk affects every operating system and every programming
+language. The degree in which it exists varies, a lot. Some
+programming language ecosystems seem more vulnerable than others: the
+nodejs/npm one, for example, values tiny and highly focused packages,
+which leads to immense dependency trees. The more direct or indirect
+dependencies there are, the higher the chance that one of them turns
+out to be bad.
+
+The risk also exists for more traditional languages, such as C. Few C
+programs have no dependencies. They all need a C compiler, which in
+turns requires an operating system, at least.
+
+The risk is there for both free software systems, and non-free ones.
+As an example, the Debian system is entirely free software, but it's
+huge: the Debian 10 (buster) release has over 50 thousand packages,
+maintained by thousands of people. While it's probable that none of
+those packages contains actual malware, it's not certain. Even if
+everyone who helps maintain is completely trustworthy, the amount of
+software in Debian is much too large for all code to be
+comprehensively reviewed.
+
+This is true for all operating systems that are not mere toys.
+
+The conclusion here is that to build software securely, we can't
+assume all code involved in the build to be secure. We need something
+more secure. The Contractor aims to be a possible solution.
+
+## Threat model
+
+This section collects a list of specific threats to consider.
+
+* accessing or modifying files not part of the build
+* excessive use build host resources
+ * e.g., CPU, GPU, RAM, disk, etc
+ * this might happen to make unauthorized use of the resources, or to
+ just be wasteful
+* excessive use of network bandwidth
+* attack on a networked target via a denial of service attack
+ * e.g., build joins a DDoS swarm, or sends fabricated SYN packets to
+ prevent target from working
+* attack on build host, or other host, via network intrusion
+ * e.g., port scanning, probing for known vulnerabilities
+* attack build host directly without network
+ * e.g., by breaching security isolation using build host kernel or
+ hardware vulnerabilities, or CI engine vulnerabilities
+ * this includes eavesdropping on the host, and stealing secrets
+
+## Status of this document
+
+Everything about the Contractor is in its early stages of thinking,
+sketching, experimentations, and planning. Nothing is nailed down yet.
+
+Pre-ALPHA. Don't trust anything. Anything you trust may be used
+against you. Anything may change.
+
+# Requirements
+
+This chapter discusses the requirements for the Contractor solution.
+The requirements are divided into two parts: one that's based on the
+threat model, and another for requirements that aren't about security.
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in [RFC 2119][].
+
+[RFC 2119]: https://tools.ietf.org/html/rfc2119
+
+## Security requirements
+
+These requirements stem from the threat model above.
+
+* **FilesystemIsolation**: The Contractor MUST prevent the build from
+ accessing or modifying any files outside the build. Build tools and
+ libraries outside the source tree MUST be usable.
+
+* The Contractor MUST prevent the build from using more than the
+ user-specified amount of CPU time (**HardCPULimit**), disk space
+ (**HardDiskLimit**), or network bandwidth (**HardBandwidthLimit**).
+ Any attempt by the build to use more should fail. The Contractor
+ MUST fail the build if the limits are exceeded.
+
+* **HardRAMLimit**: The Contractor MUST prevent the build from using
+ more than the user-specified amount of RAM. The Contractor MAY fail
+ to build if the limit is exceeded, but is not required to do so.
+
+* **ConstrainNetworkAccess**: The Contractor MUST prevent the build
+ from accessing the network ourside the build environment in ways
+ that haven't been specifically allowed by the user. The contractor
+ SHOULD fail the build if it makes an attempt at such access. The
+ user MUST be able to specify which hosts to access, and using which
+ protocols.
+
+* **HostProtection**: The Contractor SHOULD attempt to protect the
+ host its running on from non-networked attacks performed by the
+ build. This includes vulnerabilities of the host's operating system
+ kernel, virtualisation solution, and hardware.
+
+## Non-security requirements
+
+* **AnyBuildOS**: Builds SHOULD be able to run in any operating system
+ that can be run as a virtual machine guest of the host operating
+ system.
+
+* **NoRoot**: Running the Contractor SHOULD NOT require root
+ privileges. It's OK to require sufficient privileges to use
+ virtualisation.
+
+* **DefaultBuilder**: The Contractor SHOULD be easy to set up and to
+ use. It should not require extensive configuration. Running a build
+ should be as easy as running **make**(1) on the command line. It
+ should be feasible to expect developers to use the Contractor for
+ their normal development work.
+
+
+# Architecture
+
+This chapter discusses the architecture of the solution, with
+particular emphasis on threat mitigation.
+
+The overall solution is use of nested virtual machines, running on a
+developer's host. The outer VM runs the Contractor itself, and is
+called "the manager VM". The inner VM runs the build, and is called
+the "worker VM". The manager VM controls the worker VM, proxies its
+external access, and prevents it from doing anything nefarious. The
+manager VM is managed by a command line tool. Developers only interact
+directly with the command line tool.
+
+~~~dot
+digraph "arch" {
+ labelloc=b;
+ labeljust=l;
+ dev [shape=octagon label="Developer"];
+ img [shape=tab label="VM image"];
+ src [shape=tab label="Source tree"];
+ ws [shape=tab label="Exported workspace"];
+ apt [shape=tab label="APT repository"];
+ subgraph cluster_host {
+ label="Host system \n (the vulnerable bit)";
+ contractor [label="Contractor CLI"];
+ subgraph cluster_contractor {
+ label="Manager VM \n (defence force)";
+ manager;
+ libvirt;
+ subgraph cluster_builder {
+ label="Worker VM \n (here be dragons)";
+ style=filled;
+ fillcolor="#dd0000";
+ guestos [label="Guest OS"];
+ }
+ }
+ }
+ dev -> contractor;
+ contractor -> manager;
+ contractor -> guestos;
+ img -> contractor;
+ ws -> contractor;
+ src -> contractor;
+ apt -> guestos;
+ manager -> libvirt;
+ libvirt -> guestos;
+ contractor -> ws;
+}
+~~~
+
+This high-level design is chosen for the following reasons:
+
+* it allows the build to happen in any operating system
+ (**AnyBuildOS**)
+* the Contractor is a VM and running it doesn't require root
+ privileges; it has root inside both VMs if needed; all the
+ complexity of setting things up so the worker VM works correctly are
+ contained the manager VM, and the user need do only minimal
+ configuration (**NoRoot**)
+* the command line tool for using the Contractor can be made to be as
+ easy as any build tool so that developer actually use it by default
+ (**DefaultBuilder**)
+* the manager VM can monitor and control the build (**HardCPULimit**,
+ **HardBandwidthLimit**)
+* the manager can supply the worker VM with only a specified amount of
+ RAM and disk space (**HardRAMLimit**, **HardDiskLimit**)
+* the manager can set up routing and firewalls so that the worker VM
+ cannot access the network, except via proxies provided by the outer
+ VM (**ConstrainNetworkAccess**)
+* the nested VMs provide a smaller attack surface than the Linux
+ kernel API, and this protects the host better than Linux container
+ technologies, although it doesn't do much to protect against
+ virtualisation or hardware vulnerabilities (**HostProtection**)
+
+## Build process
+
+The architecture leads to a build process that would work roughly like
+this:
+
+* the manager VM is already running
+* developer runs command line tool to do a build:
+ `contractor build foo.yaml`
+* command line tool copies the worker VM image into the manager VM
+* command line tool boots the worker VM
+* command line tool installs any build dependencies into the worker VM
+* command line tool copies a previously saved dump of the workspace
+ into the worker VM
+* command line tool copies the source code and build recipe into the
+ worker VM's workspace
+* command line tool runs build commands in the worker VM, in the
+ source tree
+* command line tool copies out the workspace into a local directory
+* command line tool reports to the developer build success or failure
+ and where build log and build artifacts are
+
+## Implementation sketch (FIXME: update)
+
+FIXME: write this
+
+
+
+# Acceptance criteria
+
+This chapter specifies acceptance criteria for the Contractor, as
+*scenarios*, which also define how the criteria are automatically
+verified.
+
+## Local use of the Contractor
+
+These scenarios use the Contractor locally, to make sure it can do
+things that don't require the VM.
+
+### Parse build spec
+
+Make sure the Contractor can read a build spec and dump it back out,
+as JSON. This exercises the parsing code. JSON output is chosen,
+instead of YAML, to make sure the program doesn't just copy input to
+output.
+
+~~~scenario
+given file dump.yaml
+when I invoke contractor dump dump.yaml
+then the JSON output matches dump.yaml
+~~~
+
+~~~{.file #dump.yaml .yaml .numberLines}
+worker-image: worker.img
+ansible:
+ - hosts: worker
+ remote_user: worker
+ become: true
+ tasks:
+ - apt:
+ name: build-essential
+ vars:
+ ansible_python_interpreter: /usr/bin/python3
+source: .
+workspace: workspace
+build: |
+ ./check
+~~~
+
+## Smoke tests
+
+These scenarios build a simple "hello, world" C application on a
+variety of guest systems, and verify the resulting binaries output the
+desired greeting. The goal of these scenarios is to ensure the various
+Contractor components fit together at least in the very basic case.
+
+### Debian smoke test
+
+This scenario checks that the developer can build a simple C program
+in the Contractor.
+
+~~~disabled-scenario
+given a working contractor
+and file hello.c
+and file hello.yaml
+and file worker.img from source directory
+when I run contractor build hello.yaml
+then exit code is 0
+then file ws/src/hello exists
+~~~
+
+~~~{.file #hello.c .c .numberLines}
+#include <stdio.h>
+
+int main()
+{
+ printf("hello, world\n");
+ return 0;
+}
+~~~
+
+~~~{.file #hello.yaml .yaml .numberLines}
+worker-image: worker.img
+ansible:
+ - hosts: worker
+ remote_user: worker
+ become: true
+ tasks:
+ - apt:
+ name: build-essential
+ vars:
+ ansible_python_interpreter: /usr/bin/python3
+source: .
+workspace: ws
+build: |
+ gcc hello.c -o hello
+ ./hello
+~~~
+
+
+
+---
+title: "Contractor: build software securely"
+author: "Lars Wirzenius"
+bindings:
+- subplot/contractor.yaml
+- subplot/vendor/runcmd.yaml
+- subplot/files.yaml
+functions:
+ - subplot/contractor.py
+ - subplot/vendor/runcmd.py
+ - subplot/files.py
+documentclass: report
+classes:
+ - c
+ - disabled-scenario
+abstract: |
+ Building software typically requires running code downloaded from
+ the Internet. Even when you're building your own software, you
+ usually depend on libraries and tools, which in turn may depend on
+ further things. It is becoming infeasible to vet the whole set of
+ software running during a build. If a build includes running local
+ tests (unit tests, some integration tests), the problem gets worse
+ in magnitude, if not quality.
+
+ Some software ecosystems are especially vulnerable to this (nodejs,
+ Python, Ruby, Go, Rust), but it's true for anything that has
+ dependencies on any code from outside its own code base, and even if
+ all the dependencies come from a trusted source, such as the
+ operating system vendor or a Linux distribution.
+
+ The Contractor is an attempt to be able to build software securely,
+ by leveraging virtual machine technology. It attempts to be
+ secure, convenient, and reasonably efficient.
+
+...