ambient.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437

# Introduction

Ambient CI will be a continuous integration system: an automated
system that build, tests, delivers, and deploys software. It will be
simple, sane, safe, secure, speedy, and
supercalifragilisticexpialidocious. There are many existing such
systems, but we feel none of them are excellent.

For now, it is a command line tool that runs a build locally, in a VM,
without network access.

## Continuous integration concepts

The concepts and related terminology around continuous integration
systems is not entirely standardized. To be clear, here are
definitions for Ambient. The goal here is being clear and unambiguous
in the Ambient context, not to define terminology in the wider
community of software developers.

Note: Ambient is currently aimed at building, testing, and publishing
software, but not deploying it. That may change, but it's possible
that deployment will need its own kind of system rather than forcing a
CI system to do it badly.

A CI system consists of several abstract components, or
responsibilities. These may end as separate programs and services in
Ambient, or several may be combined into one. For thinking and
discussing the system and its software architecture, we pretend that
they're all separate programs.

* **artifact** -- a file produced by a build

* **artifact store** -- where artifacts are stored after a build step
  is finished
  - we can't assume we can access files on the worker, but we do need
    some place to store artifacts
  - the artifact store can also store intermediate state between build
    steps
  - later build steps and build run may fetch the artifacts from the
    store

* **build** or **build run** -- an execution of all the steps of a
  build graph
  - some CI systems call this "job" but that seems unclear terminology

* **build graph** -- the full set of steps to build a project
  - in the simple case, the steps form a simple sequence: 
    - build program
    - run its tests
    - build Debian package
    - publish Debian package
  - in the general case, some steps may be performed concurrently
    (building Debian package can happen while tests are run):
    - build program
    - concurrently:
      - run its tests
      - build Debian package
    - publish Debian package
  - in the even more general case, some steps may need performed on
    several target systems
    - build Debian source package
    - concurrently on amd64, armhf, aarch64, riscv32, risc64:
      - build program
      - run its tests
      - build Debian binary package from source package
    - publish Debian source package and all binary packages
      - but only if all binary packages succeeded
  - how users specify this is going to be crucial for Ambient

* **build step** -- a concrete step in a build graph
  - for example, "run this shell command to build executable binaries
    from the source code", "run this program to create a tar archive
    of the binaries"

* **controller** -- system that keeps track of projects, build runs,
  and their current state, and what needs to happen next
  - once a build run is triggered, the controller makes sure every
    step gets executed, and handles steps failing, or taking too long
  - the controller tells workers what to do
  - the controller checks the result of each step, and picks the next
    step to execute, and the worker to execute it

* **project** -- a thing that needs to be built and tested
  - each project has a build graph

* **trigger** -- what causes a build run to start
  - to trigger a build, _something_ tells the controller that a build
    is needed; the controller does not trigger a build itself
  - triggering can be done by a change in a git repository (as if by
    the git server), or otherwise (e.g., a cron job to trigger a
    nightly build)

* **worker** -- executes build steps
  - there can be many workers, and many workers may be needed for a
    complete build
  - consecutive steps for the same build run may or may not be
    executed by the same worker, at the discretion of the controller;
    if necessary, state is communicated between workers via the
    artifact store
  - conceptually, a worker executes one build step at a time; to make
    better user of hardware resources, run multiple workers
    concurrently

# Motivation

## Problems we see

These are not in any kind of order.

* Debugging: when (not if) there is a build failure, it can be tedious
  and frustrating to figure what the cause is and to fix that. Often
  the failure can be difficult to reproduce locally, or otherwise in a
  way that can be inspected except via the build log.

* Capacity: individuals and small organization often don't have a much
  capacity to spare for CI purposes, which hampers and slows down
  development and collaboration.

* Generality: many CI systems run jobs in containers, which has a low
  overhead, but limits what a job can do. For example, containers
  don't allow building on a different operating system from the host
  or on a different computer architecture.

* Security: typically a CI system doesn't put many limits on what the
  job can do, which means that building and testing software can be a
  security risk.

## Needs and wants we have

These are not in any kind of order.

* We want to build software for different operating systems and
  computer architectures, and versions thereof, as one project, with
  the minimum of fuss. One project should be able to build binaries
  and installation packages for any number of targets as part of one
  build.

* It must be possible to construct and update the build environments
  within the CI system itself. For example, by building the virtual
  machine base image for build workers.

* We want builds to be quick. The CI system should add only little
  overhead to a build. When it's possible to break a build into
  smaller, independent parts, they are run concurrently as much
  hardware capacity allows.

* We want it to be easy to provide build workers, without having to
  worry about the security of the worker host, or the security of the
  build artifacts.

* If a build is going to fail for a reason that can be predicted
  before it even starts, the job should not start. For example, if a
  build step runs a shell command, the syntax should be checked before
  the job starts. Obviously this is not possible in every case, but in
  the common case it is.

* Build failures should be easy to debug. Exactly what this means is
  unclear at the time of writing, but it should be a goal for all
  design and development work.

* It's easy to host both client and server components.
 
* It's possible, straightforward, and safe, for workers to require
  payment to run a build step. This needs to be done in a way that is
  unlikely to anyone being scammed.

* Is integrated into major git hosting platforms (GitHub, GitLab,
  etc), but is not tied to any git platform, or git at all.

* Build logs can be post-processed by other programs.


# Visions of CI

## Megalomaniac vision: CI is everywhere, all the time

Ambient links together build workers and development projects to
provide a CI system that just is there, all the time, everywhere.
Anyone can provide a worker that anyone can use. The system constrains
build jobs so that they can only do safe things, and can only use an
acceptable amount of resources, and guarantees that the output of the
build can be trusted.

(This may be impossible to achieve, but we can dream. If you don't aim
for the stars you are at risk of shooting yourself in the foot.)

## Plausible vision: CI is easy to provide and use

The various components of Ambient are easy to set up, and to keep
running, and to use. Within an organization it's easy to access. It's
so easy provide a worker on one's own machine, without worry, that
everyone in the organization can be expected to do so. Builds can
easily be reproduced locally.

## Realistic vision

Ambient provides a command line tool to run a job in a safe, secure
manner that is easily reproduced by different people, to allow
collaborating on a software development project in a controlled way.

# Threat modeling

This model concerns itself with running a build locally. Some
terminology:

* project -- what is being built
* host -- the computer where Ambient is run

Many software projects require running code from that project to be
built, and certainly when it's automated tests are run. The code might
not be directly part of the project, but might come from a dependency
specified by the project. This code can do anything. It might be
malicious and attack the build host. It probably doesn't, but Ambient
must be able to safely and securely build and test projects that
aren't fully trusted and trustworthy.

The attacks we are concerned with are:

* reading, modifying, or storing data on the host, in unexpected ways
* using too much CPU on the host
* using too much memory on the host
* using too much disk space on the host
* accessing the network from the host, in unexpected ways

## Prevention

We build and test in a local virtual machine.

The VM has no network access at all. We provide the VM the project
source code via a read-only virtual disk. We provide the VM with
another virtual disk where it can store any artifacts that should
persist. Both virtual disks will contain no file system, but a tar
archive.

We provide the VM with a pre-determined amount of virtual disk. The
project won't be able to use more.

We provide the VM with an operating system image with all the
dependencies the project needs pre-installed. Anything that needs to
be downloaded from online repositories is specified by URL and
cryptographic checksum, and downloaded before the VM starts, and
provided to the build via a virtual disk to the VM.

We interact with the VM via a serial console only.

We run the VM with a pre-determined amount of disk, number of CPUs,
amount of memory. We fail the build if it exceeds a pre-determined
time limit. We fail the build if the amount of output via the serial
console exceeds a pre-determined limit.

# Architecture

At a very abstract level, the Ambient architecture is as follows:

* Ambient creates a virtual machine with four block devices (using
  `virtio_blk`) in addition to the system disk (`/dev/vda` on Linux):
  - `/dev/vdb`: the read-only source device: a tar archive of the
    project's source tree
  - `/dev/vdc`: the read/write artifact device: for the project to
    write a tar archive of any build artifacts it wants to export
    - this would be write-only if that was possible
    - when the build starts, this contains only zeroes
    - after the build a tar archive is extracted from this
  - `/dev/vdd`: the read-only dependencies device: a tar archive of
    additional dependencies in a form that the project can use
  - `/dev/vde`: the read/write cache device: a tar archive of any
    files the project wants to persist across runs; for example, for a
    Rust project, this would contains the cargo target directory
    contents
    - when a build starts, this can be empty; the build must deal with
      an empty cache
* The VM additionally has a serial port where it will write the
  build log. On Linux this is `/dev/ttyS0`.
* The VM automatically, on boot, creates
  `/workspace/{src,cache,deps}`, and extracts the source, cache, and
  dependencies tar archives to those directories.
* The VM then changes current working directory to `/workspace/src`
  and runs `./.ambient-script` (if the script isn't executable, the VM
  first makes it so). The script's stdout and stderr are redirected to
  the serial port.

The `ambient-build.service` and `ambient-run-script` files in the
Ambient source tree implement this for Linux with systemd, and have
been tested with Debian.

# Acceptance criteria

[Subplot]: https://subplot.tech

These acceptance criteria are written for the [Subplot][] tool to
process. They are verified using scenarios expressed in a
given/when/then language. For details, please see the Subplot
documentation.

## `ambient-run-script`

This section concerns itself with `ambient-run-script`, which is part
of the VM in which the build is run.

### Accepts the various devices

_Requirement: `ambient-run-script` accepts the various input and output
devices correctly._

We verify this by running the script with an option that only exists
for this purpose to dump its configuration as text.

~~~scenario
given an installed ambient-run-script
given file tars.txt
when I run ./ambient-run-script --dump-config=dump.txt -t /dev/ttyS0 -s input.tar -a output.tar -c cache.tar -d deps.tar
then files dump.txt and tars.txt match
~~~

~~~{#tars.txt .file .text}
ambient-run-script:
- dump: dump.txt
- tty: /dev/ttyS0
- src: input.tar
- artifact: output.tar
- cache: cache.tar
- dependencies: deps.tar
- root: /
- dry_run: None
~~~


### Lists steps in happy path

This scenario verifies two requirements, for the sake of simplicity of
test implementation.

* _Requirement: `ambient-run-script` must perform the same steps every
  time, unless something goes wrong._

  We verify this by having `ambient-run-script` list the steps is would
  do, without actually doing them, using the `--dry-run` option.

~~~scenario
given an installed ambient-run-script
given file expected-steps.txt
when I run ./ambient-run-script --dry-run=steps.txt -t /dev/ttyS0 -s input.tar -a output.tar -c cache.tar -d deps.tar
then files steps.txt and expected-steps.txt match
~~~

~~~{#expected-steps.txt .file .text}
create /workspace
extract input.tar to /workspace/src
extract cache.tar to /workspace/cache
extract deps.tar to /workspace/deps
build in /workspace/src
save /workspace/cache to cache.tar
~~~

### Performs expected steps in happy path

_Requirement: `ambient-run-script` must prepare to run a build in a VM._

`ambient-run-script` in inside the VM, so we verify this requirement
by having it run a build that we prepare so it's safe to run in our
test environment, without a VM. We make use of a special option so
that paths used by the program are relative to the current working
directory, instead of absolute. We also verity that the output device
gets output written, and that the cache device gets updated.

This is a bit of a long scenario, so it's divided into chunks. First
we set things up.

~~~scenario
given an installed ambient-run-script

given file project/.ambient-script from simple-ambient-script
given tar archive input.tar with contents of project

given file cached/data from cache.data
given tar archive cache.tar with contents of cached

given file deps/deps.data from deps.data
given tar archive deps.tar with contents of deps
~~~

We now have the source files, cached data, and dependencies and we can
run `ambient-run-script` with them.

~~~scenario
when I run ./ambient-run-script --root=. -t log -s input.tar -a output.tar -c cache.tar -d deps.tar
~~~

The workspace must now exists and have the expected contents.

~~~scenario
then file workspace/src/.ambient-script exists
then file workspace/cache/data exists
then file workspace/cache/cached contains "hello, cache"
then file workspace/deps/deps.data exists
then file log contains "hello, ambient script"
~~~

The artifact tar archive must contain the expected contents.

~~~scenario
when I create directory untar-output
when I run tar -C untar-output -xvf output.tar
then file untar-output/greeting contains "hello, there"
~~~

The cache tar archive must contain the expected contents.

~~~scenario
when I create directory untar-cache
when I run tar -C untar-cache -xvf cache.tar
then file untar-cache/cached contains "hello, cache"
then file untar-cache/data exists
~~~

That's all, folks.

~~~{#simple-ambient-script .file .sh}
#!/bin/bash

set -xeuo pipefail

echo hello, ambient script

echo hello, cache > ../cache/cached

echo hello, there > greeting
tar -cf "$1" greeting
~~~

~~~{#cache.data .file}
This is cached data.
~~~

~~~{#deps.data .file}
This is dependency data.
~~~