summaryrefslogtreecommitdiff
path: root/contractor.md
blob: faa1795397ab2a515520b1fb19b8cf154ba3fba8 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
<!-- meta data block is at the end of the file, because Emacs gets -->
<!-- less confused that way -->

# Introduction

Software development is a security risk.

Building software from source code and running it is a core activity
of software development. Software developers do it on the machine they
work on. Continuous integration systems do it on server. These are
fundamentally the same. The process is roughly as follows:

* install any dependencies
* build the software
* run the software to perform automated tests on it

When the software is run, even if only a small unit of it, it can do
anything that the person running the build can do. For example, it can
do any and all of the following, unless constrained:

* delete files
* modify files
* log into remote hosts using SSH
* decrypt or sign files with PGP
* send email
* delete email
* commit to version control repositories
* do anything with the browser that a the person could do
* run things as root using sudo
* in general, cause mayhem and chaos

Normally, a software developer can assume that the code they wrote
themselves won't ever do any of that. They can even assume that people
they work with make code that won't do any of that. In both cases,
they may be wrong: mistakes happen. It's a well-guarded secret among
programmers that they sometimes, even if rarely, make catastrophic
mistakes.

Accidents aside, mayhem and chaos may be intentional. Your own project
may not have malware, and you may have vetted all your dependencies,
and you trust them. But your dependencies have dependencies, which
have further dependencies, which have dependencies of their own. You'd
need to vet the whole dependency tree. Even decades ago, in the 1990s,
this could easily be hundreds of thousands of lines of code, and
modern systems a much larger. Note that build tools are themselves
dependencies, as is the whole operating system. Any code that is invoked
in the build process is a dependency.

How certain are you that you can spot malicious code that's
intentionally hidden and obfuscated?

Are you prepared to vet any changes to any transitive dependencies?

Does this really matter? Maybe it doesn't. If you can't ever do
anything on your computer that would affect you or anyone else in a
negative way, it probably doesn't matter. Most software developers are
not in that position.

This risk affects every operating system and every programming
language. The degree in which it exists varies, a lot. Some
programming language ecosystems seem more vulnerable than others: the
nodejs/npm one, for example, values tiny and highly focused packages,
which leads to immense dependency trees. The more direct or indirect
dependencies there are, the higher the chance that one of them turns
out to be bad.

The risk also exists for more traditional languages, such as C. Few C
programs have no dependencies. They all need a C compiler, which in
turns requires an operating system, at least.

The risk is there for both free software systems, and non-free ones.
As an example, the Debian system is entirely free software, but it's
huge: the Debian 10 (buster) release has over 50 thousand packages,
maintained by thousands of people. While it's probable that none of
those packages contains actual malware, it's not certain. Even if
everyone who helps maintain is completely trustworthy, the amount of
software in Debian is much too large for all code to be
comprehensively reviewed.

This is true for all operating systems that are not mere toys.

The conclusion here is that to build software securely, we can't
assume all code involved in the build to be secure. We need something
more secure. The Contractor aims to be a possible solution.

## Links to attacks

* [Malicious npm package opens backdoors on programmers' computers](https://www.zdnet.com/article/malicious-npm-package-opens-backdoors-on-programmers-computers/)

## Threat model

This section collects a list of specific threats to consider.

* accessing or modifying files not part of the build
* excessive use build host resources
  * e.g., CPU, GPU, RAM, disk, etc
  * this might happen to make unauthorized use of the resources, or to
    just be wasteful
* excessive use of network bandwidth
* attack on a networked target via a denial of service attack
  * e.g., build joins a DDoS swarm, or sends fabricated SYN packets to
    prevent target from working
* attack on build host, or other host, via network intrusion
  * e.g., port scanning, probing for known vulnerabilities
* attack build host directly without network
  * e.g., by breaching security isolation using build host kernel or
    hardware vulnerabilities, or CI engine vulnerabilities
  * this includes eavesdropping on the host, and stealing secrets

## Status of this document

Everything about the Contractor is in its early stages of thinking,
sketching, experimentations, and planning. Nothing is nailed down yet.

Pre-ALPHA. Don't trust anything. Anything you trust may be used
against you. Anything may change.

# Requirements

This chapter discusses the requirements for the Contractor solution.
The requirements are divided into two parts: one that's based on the
threat model, and another for requirements that aren't about security.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC 2119][].

[RFC 2119]: https://tools.ietf.org/html/rfc2119

## Security requirements

These requirements stem from the threat model above. 

* **FilesystemIsolation**: The Contractor MUST prevent the build from
  accessing or modifying any files outside the build. Build tools and
  libraries outside the source tree MUST be usable.

* The Contractor MUST prevent the build from using more than the
  user-specified amount of CPU time (**HardCPULimit**), disk space
  (**HardDiskLimit**), or network bandwidth (**HardBandwidthLimit**).
  Any attempt by the build to use more should fail. The Contractor
  MUST fail the build if the limits are exceeded.

* **HardRAMLimit**: The Contractor MUST prevent the build from using
  more than the user-specified amount of RAM. The Contractor MAY fail
  to build if the limit is exceeded, but is not required to do so.

* **ConstrainNetworkAccess**: The Contractor MUST prevent the build
  from accessing the network ourside the build environment in ways
  that haven't been specifically allowed by the user. The contractor
  SHOULD fail the build if it makes an attempt at such access. The
  user MUST be able to specify which hosts to access, and using which
  protocols.

* **HostProtection**: The Contractor SHOULD attempt to protect the
  host its running on from non-networked attacks performed by the
  build. This includes vulnerabilities of the host's operating system
  kernel, virtualisation solution, and hardware.

## Non-security requirements

* **AnyBuildOS**: Builds SHOULD be able to run in any operating system
  that can be run as a virtual machine guest of the host operating
  system.

* **NoRoot**: Running the Contractor SHOULD NOT require root
  privileges. It's OK to require sufficient privileges to use
  virtualisation.

* **DefaultBuilder**: The Contractor SHOULD be easy to set up and to
  use. It should not require extensive configuration. Running a build
  should be as easy as running **make**(1) on the command line. It
  should be feasible to expect developers to use the Contractor for
  their normal development work.


# Architecture

This chapter discusses the architecture of the solution, with
particular emphasis on threat mitigation.

The overall solution is use of nested virtual machines, running on a
developer's host. The outer VM runs the Contractor itself, and is
called "the manager VM". The inner VM runs the build, and is called
the "worker VM". The manager VM controls the worker VM, proxies its
external access, and prevents it from doing anything nefarious. The
manager VM is managed by a command line tool. Developers only interact
directly with the command line tool.

~~~dot
digraph "arch" {
  labelloc=b;
  labeljust=l;
  dev [shape=octagon label="Developer"];
  img [shape=tab label="VM image"];
  src [shape=tab label="Source tree"];
  ws [shape=tab label="Exported workspace"];
  apt [shape=tab label="APT repository"];
  subgraph cluster_host {
    label="Host system \n (the vulnerable bit)";
    contractor [label="Contractor CLI"];
    subgraph cluster_contractor {
      label="Manager VM \n (defence force)";
      manager;
      libvirt;
      subgraph cluster_builder {
        label="Worker VM \n (here be dragons)";
        style=filled;
        fillcolor="#dd0000";
        guestos [label="Guest OS"];
      }
    }
  }
  dev -> contractor;
  contractor -> manager;
  contractor -> guestos;
  img -> contractor;
  ws -> contractor;
  src -> contractor;
  apt -> guestos;
  manager -> libvirt;
  libvirt -> guestos;
  contractor -> ws;
}
~~~

This high-level design is chosen for the following reasons:

* it allows the build to happen in any operating system
  (**AnyBuildOS**)
* the Contractor is a VM and running it doesn't require root
  privileges; it has root inside both VMs if needed; all the
  complexity of setting things up so the worker VM works correctly are
  contained the manager VM, and the user need do only minimal
  configuration (**NoRoot**)
* the command line tool for using the Contractor can be made to be as
  easy as any build tool so that developer actually use it by default
  (**DefaultBuilder**)
* the manager VM can monitor and control the build (**HardCPULimit**,
  **HardBandwidthLimit**)
* the manager can supply the worker VM with only a specified amount of
  RAM and disk space (**HardRAMLimit**, **HardDiskLimit**)
* the manager can set up routing and firewalls so that the worker VM
  cannot access the network, except via proxies provided by the outer
  VM (**ConstrainNetworkAccess**)
* the nested VMs provide a smaller attack surface than the Linux
  kernel API, and this protects the host better than Linux container
  technologies, although it doesn't do much to protect against
  virtualisation or hardware vulnerabilities (**HostProtection**)

## Build process

The architecture leads to a build process that would work roughly like
this:

* the manager VM is already running
* developer runs command line tool to do a build:  
  `contractor build foo.yaml`
* command line tool copies the worker VM image into the manager VM
* command line tool boots the worker VM
* command line tool installs any build dependencies into the worker VM
* command line tool copies a previously saved dump of the workspace
  into the worker VM
* command line tool copies the source code and build recipe into the
  worker VM's workspace
* command line tool runs build commands in the worker VM, in the
  source tree
* command line tool copies out the workspace into a local directory
* command line tool reports to the developer build success or failure
  and where build log and build artifacts are

## Implementation sketch (FIXME: update)

FIXME: write this



# Acceptance criteria

This chapter specifies acceptance criteria for the Contractor, as
*scenarios*, which also define how the criteria are automatically
verified.

## Local use of the Contractor

These scenarios use the Contractor locally, to make sure it can do
things that don't require the VM.

### Parse build spec

Make sure the Contractor can read a build spec and dump it back out,
as JSON. This exercises the parsing code. JSON output is chosen,
instead of YAML, to make sure the program doesn't just copy input to
output.

~~~scenario
given file dump.yaml
when I invoke contractor dump dump.yaml
then the JSON output matches dump.yaml
~~~

~~~{.file #dump.yaml .yaml .numberLines}
worker-image: worker.img
ansible:
  - hosts: worker
    remote_user: worker
    become: true
    tasks:
    - apt:
        name: build-essential
    vars:
      ansible_python_interpreter: /usr/bin/python3
source: .
workspace: workspace
build: |
  ./check
~~~

## Smoke tests

These scenarios build a simple "hello, world" C application on a
variety of guest systems, and verify the resulting binaries output the
desired greeting. The goal of these scenarios is to ensure the various
Contractor components fit together at least in the very basic case.

### Debian smoke test

This scenario checks that the developer can build a simple C program
in the Contractor.

~~~disabled-scenario
given a working contractor
and file hello.c
and file hello.yaml
and file worker.img from source directory
when I run contractor build hello.yaml
then exit code is 0
then file ws/src/hello exists
~~~

~~~{.file #hello.c .c .numberLines}
#include <stdio.h>

int main()
{
    printf("hello, world\n");
    return 0;
}
~~~

~~~{.file #hello.yaml .yaml .numberLines}
worker-image: worker.img
ansible:
  - hosts: worker
    remote_user: worker
    become: true
    tasks:
    - apt:
        name: build-essential
    vars:
      ansible_python_interpreter: /usr/bin/python3
source: .
workspace: ws
build: |
  gcc hello.c -o hello
  ./hello
~~~



---
title: "Contractor: build software securely"
author: "Lars Wirzenius"
bindings: 
- subplot/contractor.yaml
- subplot/vendor/runcmd.yaml
- subplot/files.yaml
functions: 
  - subplot/contractor.py
  - subplot/vendor/runcmd.py
  - subplot/files.py
template: python
documentclass: report
classes:
  - c
  - disabled-scenario
abstract: |
  Building software typically requires running code downloaded from
  the Internet. Even when you're building your own software, you
  usually depend on libraries and tools, which in turn may depend on
  further things. It is becoming infeasible to vet the whole set of
  software running during a build. If a build includes running local
  tests (unit tests, some integration tests), the problem gets worse
  in magnitude, if not quality.

  Some software ecosystems are especially vulnerable to this (nodejs,
  Python, Ruby, Go, Rust), but it's true for anything that has
  dependencies on any code from outside its own code base, and even if
  all the dependencies come from a trusted source, such as the
  operating system vendor or a Linux distribution.

  The Contractor is an attempt to be able to build software securely,
  by leveraging virtual machine technology. It attempts to be
  secure, convenient, and reasonably efficient.

...