summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorLars Wirzenius <lwirzenius@wikimedia.org>2019-05-06 16:50:02 +0300
committerLars Wirzenius <lwirzenius@wikimedia.org>2019-05-06 16:50:02 +0300
commit319db51ad1d0feee86753abf2a520074e81d73bc (patch)
treed9bba97ff7c547093a12a19b128d778e3595ac82
parent8a9fb40537e53fed90b6beb9e97cf53b08a87e78 (diff)
downloadwmf-ci-arch-319db51ad1d0feee86753abf2a520074e81d73bc.tar.gz
Add: generated files so they are less easily lost
-rw-r--r--ci-arch.html271
-rw-r--r--ci-arch.pdfbin0 -> 181515 bytes
2 files changed, 271 insertions, 0 deletions
diff --git a/ci-arch.html b/ci-arch.html
new file mode 100644
index 0000000..40d5844
--- /dev/null
+++ b/ci-arch.html
@@ -0,0 +1,271 @@
+<!DOCTYPE html>
+<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="">
+<head>
+ <meta charset="utf-8" />
+ <meta name="generator" content="pandoc" />
+ <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
+ <meta name="author" content="Lars Wirzenius for WMF / Release Enginering" />
+ <title>Thoughts on the architecture of a future WMF CI system</title>
+ <style type="text/css">
+ code{white-space: pre-wrap;}
+ span.smallcaps{font-variant: small-caps;}
+ span.underline{text-decoration: underline;}
+ div.column{display: inline-block; vertical-align: top; width: 50%;}
+ </style>
+</head>
+<body>
+<header>
+<h1 class="title">Thoughts on the architecture of a future WMF CI system</h1>
+<p class="author">Lars Wirzenius for WMF / Release Enginering</p>
+<p class="date">work in progress, first draft being written</p>
+</header>
+<nav id="TOC">
+<ul>
+<li><a href="#introduction">Introduction</a></li>
+<li><a href="#requirements">Requirements</a><ul>
+<li><a href="#very-hard-requirements">Very hard requirements</a></li>
+<li><a href="#hard-requirements">Hard requirements</a></li>
+<li><a href="#softer-requirements">Softer requirements</a></li>
+<li><a href="#would-be-nice">Would be nice</a></li>
+</ul></li>
+<li><a href="#important-use-cases">Important use cases</a><ul>
+<li><a href="#normal-change-to-an-individual-component">Normal change to an individual component</a></li>
+<li><a href="#interdependent-changes">Interdependent changes</a></li>
+<li><a href="#security-embargoed-change">Security embargoed change</a></li>
+</ul></li>
+<li><a href="#design-of-specific-aspects">Design of specific aspects</a><ul>
+<li><a href="#log-storage">Log storage</a></li>
+<li><a href="#artifact-storage">Artifact storage</a></li>
+<li><a href="#credentials-management-and-access-control">Credentials management and access control</a></li>
+<li><a href="#interdependent-changes-to-multiple-components">Interdependent changes to multiple components</a></li>
+</ul></li>
+<li><a href="#the-default-pipeline">The (default?) pipeline</a></li>
+<li><a href="#architecture-ci-in-an-ecosystem">Architecture: CI in an ecosystem</a></li>
+<li><a href="#architecture-internals">Architecture: internals</a></li>
+<li><a href="#acceptance-criteria">Acceptance criteria</a></li>
+</ul>
+</nav>
+<h1 id="introduction">Introduction</h1>
+<ul>
+<li><p>CI WG plans replacement of its current WMF CI system with one of Argo, GitLab CI, Zuul v3.</p></li>
+<li><p>We aim to do “continuous deployment”, not only “continous integration” or “continuous delivery”. The goal is to deploy changes to production as often and as quickly as possible, without compromising on the safety and security of the production environment.</p></li>
+<li><p>This document goes into more detail of how the new CI system should work, without (yet) discussing which replacement is chosen. A meta-level architecture if you wish.</p></li>
+<li><p>It is assumed as of the writing of this document that future CI will build on and deploy to containers orchestrated by Kubernetes.</p></li>
+</ul>
+<h1 id="requirements">Requirements</h1>
+<ul>
+<li><p>This chapter lists the requirements we have for the CI system and which we design the system to fulfil.</p></li>
+<li><p>Each requirement is given a semi-mnemonic unique identifier, so it can be referred to easily.</p></li>
+<li><p>The goal is to make requirements be as clear and atomic as possible, so that the implementation can be more easily evaluated against the requirement: it’s better to split a big, complicated requirement into smaller ones so they can be considered separately. The original requirement can be a parent to all its parts.</p></li>
+<li><p>FIXME: We may want to have a way to track which requirements are being fulfilled, or tested by automated acceptance tests. Need to add something for this, maybe a spreadsheet.</p></li>
+<li><p>These requirements were originally written up at <a href="https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/CI_Futures_WG/Requirements" class="uri">https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/CI_Futures_WG/Requirements</a> and have been changed a little compared to that (as of the 21 March 2019 version).</p></li>
+</ul>
+<h2 id="very-hard-requirements">Very hard requirements</h2>
+<ul>
+<li><p>These are non-negotiable requirement that must all be fulfilled by a our future CI system.</p></li>
+<li><p><strong>(SELFHOSTABLE)</strong> Must be hostable by the Foundation. It’s not acceptable for WMF to rely on an outside service for this.</p>
+<ul>
+<li><p><strong>(FREESOFTWARE)</strong> Must be free software / open source. “Open core” like GitLab is be good enough, as long as we only need the parts that provide software freedom.</p>
+<p>This is partly due to the <strong>SELFHOSTABLE</strong> requirement, but also because a WMF value is to prefer open source.</p></li>
+</ul></li>
+<li><p><strong>(GITSUPPORT)</strong> Must support git. We’re not switching version control systems for CI.</p></li>
+<li><p><strong>(UNDERSTANDABLE)</strong> Must be understandable without too much effort to our developers so that they can use CI/CD productively.</p></li>
+<li><p><strong>(SELFSERVE)</strong> Must support self-serve CI, meaning we don’t block people if they want CI for a new repo. Due to <strong>PROTECTPRODUCTION</strong>, there will probably to be some human approval requirement for new projects, but as much as possible, people should be allowed to do their work without having to ask permission.</p>
+<ul>
+<li><strong>(SELFSERVE2)</strong> Should allow the developers to define or declare at least parts of the pipeline jobs in the repository: what commands to run for building, testing, etc.</li>
+</ul></li>
+</ul>
+<h2 id="hard-requirements">Hard requirements</h2>
+<ul>
+<li><p>These are not absolute requirements, and can be negotiated, but only to a minor degree.</p></li>
+<li><p><strong>(FAST)</strong> Must be fast enough that it isn’t perceived as a bottleneck by developers. We will need a metric for this.</p>
+<ul>
+<li><strong>(SHORTCYCLETIME)</strong> Must enable us to have a short cycle time (from idea to running in production). CI is not the only thing that affects this, but it is an important factor. We probably need a metric for this.</li>
+</ul></li>
+<li><p><strong>(TRANSPARENT)</strong> Must make its status and what-is-going-on visible so that its operation can be monitored and so that our developers can check the status of their builds themselves. Also the overall status of CI, for example, so they can see if their build is blocked by waiting on others.</p></li>
+<li><p><strong>(FEEDBACK)</strong> Must provide feedback to the developers as early as possible for the various stages of a build, especially the early stages (“can get source from git”, “can build”, “can run unit tests”, etc.).</p>
+<p>The goal is to give feedback as soon as possible, especially in the case of the build failing.</p></li>
+<li><p><strong>(FEEDBACK2)</strong> Must support providing feedback via Gerrit, IRC, and Phabricator, at the very least. These are our current main feedback channels.</p></li>
+<li><p><strong>(SECURE)</strong> Must be secure enough that we can open it to community developers to use without too much supervision.</p></li>
+<li><p><strong>(MAINTAINED)</strong> Must be maintained and supported upstream. The CI system should not require substantial development from the Foundation. Some customization is expected to be necessary.</p></li>
+<li><p><strong>(MANYREPOS)</strong> Must be able to handle the number of repositories, projects, builds, and deployments that we have, and will have in the foreseeable future.</p></li>
+<li><p><strong>(METRICS)</strong> Must enable us to instrument it to get metrics for CI use and effectiveness as we need. Things like cycle times, build times, build failures, etc.</p></li>
+<li><p><strong>(GERRIT)</strong> Must work with Gerrit as well as other self-hostable code-review systems (e.g., GitLab), if we decide to move to that later. This means, code review happens on Gerrit, after building and automated tests pass, and positive code review triggers deployment to production.</p></li>
+<li><p><strong>(NOREBUILDING)</strong> Must promote (copy) Docker images and other build artifacts from “testing” to “staging” to “production”, rather than rebuilding them, since rebuilding takes time and can fail. Once a binary, Docker image, or other build artifact has been built, exactly that artifact should be tested, and eventually deployed to production.</p></li>
+<li><p><strong>(LOCALTESTS)</strong> Must allow developer to replicate locally the tests that CI runs. This is necessary to allow lower friction in development, as well as to aid debugging. For example, if CI builds and tests using Docker container, a developer should be able to download the same image and run the tests locally.</p></li>
+<li><p><strong>(AUTOMATEDEPLOYMENT)</strong> Must allow deployment to be fully automated.</p>
+<ul>
+<li><strong>(AUTOMATEDSELFDEPLOYMENT)</strong> Must be automatically deployable by us or SRE, onto a fresh server.</li>
+</ul></li>
+<li><p><strong>(HSCALABLE)</strong> Must be horizontally scalable: we need to be able to add more hardware easily to get more capacity. This is particularly important for build workers, which are the mostly likely bottleneck. Also, probably environments used for testing.</p></li>
+<li><p><strong>(PROGLANGS)</strong> Must be able to support all programming languages we currently support or are likely to support in the future. These include, at least, shell, Python, Ruby, Java, PHP, and Go. Some languages may be needed in several versions.</p></li>
+<li><p><strong>(OUTPUTLINKS)</strong> Must support HTTP linking to build results for easier reference and discussion. This way a build log, or a build artifact, can be reference using a simple HTTP (or HTTPS) link.</p></li>
+<li><p><strong>(ARTIFACTARCHIVE)</strong> Should allow archiving build logs, executables, Docker images, and other build artifacts for a long period.</p>
+<ul>
+<li><strong>(RETENTION)</strong> The retention period should be configurable based on artifact type, and whether the build ended up being deployed to production.</li>
+</ul></li>
+<li><p><strong>(CONFIGVC)</strong> Must keep configuration in version control. This is needed so that we can track changes over time.</p></li>
+<li><p><strong>(GATING)</strong> Must support gating / pre-merge testing. FIXME: This needs to be explained.</p></li>
+<li><p><strong>(PERIODICBUILDS)</strong> Must support periodic / scheduled testing. This is needed so that we can test that changes to the environment haven’t broken anything. An example would be changes to Debian, upon which we base our container images.</p></li>
+<li><p><strong>(POSTMERGETESTS)</strong> Must support post-merge testing. FIXME: This needs to be explained.</p></li>
+<li><p><strong>(CIMERGES)</strong> Must support tooling to do the merging, instead of developers. We don’t want developer merging by hand and pushing the merges. CI should test changes and merge only if tests pass, so that the branches for main lines of development are always releaseble.</p></li>
+<li><p><strong>(TESTVC)</strong> Must support storing tests in version control. This is probably best achieved by having tests be stored in the same git repository where the code is.</p></li>
+<li><p><strong>(BUILDDEPS)</strong> Must have some way to declare dependent repositories / software needed for testing. FIXME: This needs to be explained.</p></li>
+<li><p><strong>(TESTSERVICES)</strong> Must support services for tests — i.e., some PHPUnit tests require MySQL. These are most important for integration tests. Proper unit tests do not depend on any external stuff. However, integration tests may well need MediaWiki, some specific extensions, and backing services, such as databases, “oid” services, and possibly more. CI needs to be able to provide such environments for testing.</p></li>
+<li><p><strong>(OTHERGITORTICKETING)</strong> Must allow changing git repository, code review, and ticketing systems from Gerrit and Phabricator. We are not currently looking at switching away from Gerrit and Phabricator, but the future CI solution should not lock us into specific code review or ticketing solutions.</p></li>
+<li><p><strong>(PROTECTPRODUCTION)</strong> Must protect production by detecting problems before they’re deployed, and must in general support a sensible CI/CD pipeline. This is necessary both for the safety and security of our production systems, a higher speed of development, and higher productivity. The protection brings developer confidence, which tends to bring speed and productivity.</p>
+<ul>
+<li><strong>(ENFORCETESTS)</strong> Must allow Release Engineering team to enforce tests on top of what a self-serving developer specifies, to allow us to set minimal technical standards.</li>
+</ul></li>
+<li><p><strong>(CACHEDEPS)</strong> Must support dependency caching – we have castor, maybe we could do better? Maybe some CI systems have this figured out? This means, for example, caching npm and PyPI packages so that every build doesn’t need to download them directly from the centralised package repositories. This is needed for speed.</p></li>
+</ul>
+<h2 id="softer-requirements">Softer requirements</h2>
+<ul>
+<li><p>These requirements are even more easily negotiated.</p></li>
+<li><p><strong>(HA)</strong> Should be highly available - can restart any component without disrupting service.</p></li>
+<li><p><strong>(LIVELOG)</strong> Should have live console output of build.</p></li>
+<li><p><strong>(MAXBUILDTIME)</strong> Should have build timeouts so that a build may fail if it takes too long. Among other reasons, this is useful to automatically work around builds that get “stuck” indefinitely.</p></li>
+<li><p><strong>(CLEANWORKSPACE)</strong> Should provide a clean workspace for each test run - either a clean VM or container.</p></li>
+<li><p><strong>(RATELIMIT)</strong> Should have rate limiting - one user/project can not take over most/all resources.</p></li>
+<li><p><strong>(CHECKSIG)</strong> Should support validation and creation of GPG/PGP-signed git commits</p></li>
+<li><p><strong>(SECRETS)</strong> Should support secure storage of credentials / secrets.</p></li>
+</ul>
+<h2 id="would-be-nice">Would be nice</h2>
+<ul>
+<li><p>These are so soft they aren’t even requirements, and more wish list items.</p></li>
+<li><p><strong>(LIMITBOILERPLATE)</strong> Would be nice for test abstractions to limit boiler-plate, i.e., all of our services are tested roughly the same way without having to copy instructions to every repository.</p></li>
+<li><p><strong>(PRIORITIZEJOBS)</strong> Would be nice to prioritize jobs.</p>
+<ul>
+<li><p>Use case: if there is a queue of jobs, there should be some mechanism of jumping that queue for jobs that have a higher priority.</p></li>
+<li><p>We currently have a Gating queue that is a higher priority than periodic jobs that calculate Code Coverage.</p></li>
+</ul></li>
+<li><p><strong>(ISOLATION)</strong> Would be nice to support isolation / sandboxing.</p>
+<ul>
+<li><p>Jobs should be isolated from one another.</p></li>
+<li><p>Jobs should be able to install apt-packages without affecting dependencies of other jobs.</p></li>
+</ul></li>
+<li><p><strong>(CONTROLAFFINITY)</strong> Would be nice to have configurable job requirements/affinity.</p>
+<ul>
+<li>Be able to schedule a job only on nodes that have at least X available disk space/ram/cpu/whatever OR try to schedule on nodes where a current build of this job isn’t already running.</li>
+</ul></li>
+<li><p><strong>(POSTMERGEBISECT)</strong> Would be nice to post-merge git-bisect to find patch that caused a particular problem with a Selenium test.</p></li>
+<li><p><strong>(DEPLOYWHEREVER)</strong> Would be nice to have a mechanism for deployment to staging, production, pypi, packagist, toollabs. We could do with a way to deploy to any of several possible environments, for various use cases, such as bug repoduction, manual exploratory testing, capacity testing, and production. FIXME: what do pypi and packagist do in the list?</p></li>
+<li><p><strong>(MATRIXBUILDS)</strong> Would be nice to have efficient matrix builds.</p>
+<ul>
+<li>E.g., we currently run phpunit tests and browser tests for the Cartesian product of {PHP7 PHP7.1 PHP7.2 HHVM} x {MySQL, SQLite, PostgreSQL} x {Composer, MediaWiki vendor}, but we perform setup/git clone for all of those tests. Doing that in a space and time efficient way would be good.</li>
+</ul></li>
+<li><p><strong>(MOBILE)</strong> Would be nice to support building and testing mobile applications (at minimum for iOS and Android).</p></li>
+<li><p><strong>(EMBARGO)</strong> Would be nice to be able to run for secret/security patches. This means CI should be able to build and deploy changes that can’t be made public yet, for security embargo reasons.</p></li>
+</ul>
+<h1 id="important-use-cases">Important use cases</h1>
+<p>These are some of the important use cases for the CI system, and how we plan CI to implement them.</p>
+<h2 id="normal-change-to-an-individual-component">Normal change to an individual component</h2>
+<ul>
+<li><p>a developer pushes a change to one program that runs in production</p></li>
+<li><p>the change is indepent of other changes and no other component depends on the chage</p></li>
+<li><p>e.g., bug fix, not a feature change</p></li>
+<li><p>the governing principle is that with commit stage and acceptance stage passing, plus a positive code review, the changes can be deploed to to production in most cases</p></li>
+<li><p>developer pushes change, this trigger commit and acceptance stages, which pass, which triggers code review requests to be sent to reviewers</p></li>
+<li><p>reviewers vote +2, which triggers a deployment to production</p></li>
+<li><p>this is the simplest possible use case for CI</p></li>
+</ul>
+<h2 id="interdependent-changes">Interdependent changes</h2>
+<ul>
+<li><p>changes to two or more components that must all be applied at once or not at all, e.g., to mediawiki core and an extension</p></li>
+<li><p>in this scenario the change to MediaWiki core and the change to an extension may depend on each other, so that if either is deployed without the other, the system as a whole breaks; thus, either both changes get deployed, or neither</p></li>
+<li><p>Lars’s opinion: this seems like a bad way of managing development. It seems better to be careful with such changes so that they can be disabled behind a feature flag in the configuration, or by autodetection of the other component, so that if only one component has been changed, it can stillbe deployed. Only when both components have been changed in production is the feature flag enabled, and the new feature works.</p></li>
+</ul>
+<h2 id="security-embargoed-change">Security embargoed change</h2>
+<ul>
+<li><p>change can’t be public until it’s deployed or manually made public</p></li>
+<li><p>this is typically part of “responsible disclosure”</p></li>
+<li><p>the change will be made public, but CI should be able to use it even before it’s public, so that when it’s time, there’s no need to wait for CI to build/test the change and it can just be merged and deployed</p></li>
+<li><p>this means some builds and builds artifacts need to be locked away from public</p></li>
+</ul>
+<h1 id="design-of-specific-aspects">Design of specific aspects</h1>
+<h2 id="log-storage">Log storage</h2>
+<ul>
+<li><p>We want to capture the build log or “console output” (stdout, stderr) of the build and store it. This is an invaluable tool for developers to understand what happens in a build, and especially why it failed.</p></li>
+<li><p>Ideally, the build log is formatted in a way that’s easy for humans to read.</p></li>
+<li><p>It’d also be nice if the build log can be easily processed programmatically, to extract information from it automatically.</p></li>
+<li><p>We may want to store build logs for extended periods of time so that we can analyze them later. By storing them in a de-duplicating and compressing manner, the way backup software like Borg does, the storage requirements can be kept reasonable.</p></li>
+</ul>
+<h2 id="artifact-storage">Artifact storage</h2>
+<ul>
+<li><p>Artifacts are all the files created during the build process that may be needed for automated testing or deployment to production or any other environment: executable binaries, minimized Javascript, automatically generated documentation from source code (javadoc).</p></li>
+<li><p>We basically need to store arbitrary blobs for some time. We need to retrieve the blobs for deployment, and possibly other reasons.</p></li>
+<li><p>We may want to store artifacts that get deployed to production for a longer time than other artifacts so that we can keep a history what was in production at any recent-ish point in time.</p></li>
+<li><p>We will want to trace back from each artifact which git repository and commit it came from.</p></li>
+<li><p>We can de-duplicate artifacts (a la backup programs) to save on space. Even so, we will want to automatically expire artifacts on some flexible schedule to keep storage needs in control.</p></li>
+<li><p>We need to decide when we can make these artifacts publically accessible.</p></li>
+<li><p>Artifact storage must be secure, as everything that gets deployed to production goes via it.</p></li>
+<li><p>There are some artifact storage systems we can use.</p></li>
+</ul>
+<h2 id="credentials-management-and-access-control">Credentials management and access control</h2>
+<ul>
+<li><p>Credentials and other secrets are used to allow access to servers, services, and files. They are often highly security sensitive data. The CI system needs to protect them, but allow controlled use of them.</p></li>
+<li><p>Example: a CI job needs to deploy a Docker image with a tested and reviewed change as a container orchestrated by Kubernetes. For this, it needs to authenticate itself to the Kubernetes API. This is typically done by a username/password combination. How will the future CI system handle this?</p></li>
+<li><p>Example: for tests, and in production, a MediaWiki container needs access to a MariaDB database, and MW needs to authenticate itself to the database. MW gets the necessary credentials for this from its configuration, which CI will install during deployment. The configuration will be specific for what the container is being used: if it’s for testing a change, the configuration only allows access to a test database, but for production it provides access to the production database.</p></li>
+<li><p>FIXME: This is unclear as yet, the text below is some incoherent preliminary rambling by Lars which needs review and fixing.</p></li>
+<li><p>Builds are done in isolated containers. These containers have no credentials. Build artifacts are extracted from the containers and stored in an artifact storage system by the CI system, and this is done in a controlled environment, where only vetted code is run, not code from the repository being tested.</p></li>
+<li><p>Deployments happen in controlled environments, with access to the credentials needed for deployment. The deployment retrieve artifacts from the artifact storage system. The deployments are to containers, and the deployed continers don’t have any credentials, unless CI has been configured to install them, in which case CI installs the credentials for the intended use of the container.</p></li>
+<li><p>Tests run against software deployed to containers, and those containers only have access to the backing services needed for the test.</p></li>
+<li><p>The CI system needs a way to store the credentials that can only be accessed by CI itself, when it’s deploying a container (Kubernetes API access) or configuring the container (installing credentials for the intended use of container).</p>
+<p>This might be, for example, a set of files deployed to the CI host where container deployment or configuration runs, with access control provided by Unix permissions. Not sure if this is sufficiently secure.</p></li>
+</ul>
+<h2 id="interdependent-changes-to-multiple-components">Interdependent changes to multiple components</h2>
+<p>FIXME</p>
+<h1 id="the-default-pipeline">The (default?) pipeline</h1>
+<ul>
+<li><p>FIXME: this could really do with a graph</p></li>
+<li><p>CI will provide a default pipeline for all projects</p>
+<ul>
+<li><p>divided into several stages</p></li>
+<li><p>mandatory stages: commit, acceptance; other stages may be added to other projects as needed</p></li>
+<li><p>the goal is that if commit + acceptance stages pass, the project has a candidate that can be deployed to production, unless the project is such that it needs (say) manual testing or other human decision for the production deployment decision</p></li>
+<li><p>if commit or acceptance stage fails, there is not production candidate</p></li>
+</ul></li>
+<li><p>commit stage</p>
+<ul>
+<li><p>builds all the artifacts that will be used by later stages</p></li>
+<li><p>runs unit tests</p></li>
+<li><p>other tests, possibly integration tests</p></li>
+<li><p>code health checks</p></li>
+<li><p>the commit stage is expected to be fast, aiming at less than five minutes, so that we can expect developers to wait for it to pass successfully</p></li>
+<li><p>the commands to build (compile) or run automated tests are stored in the repository, either explicity, or by indicating the type of build needed; for example, the repository may specify “make” as the command to run, or it may specify that it’s a Go project, and CI would know how to build a Go project; in the latter case we can change the commands to build a Go project by changing CI only, without having to change each git repository with a Go program</p></li>
+<li><p>only the declarative style is possible for building Docker images, as we want control over how that is done</p></li>
+<li><p>CI may enforce specific additional commands to run, to build or test further things; this can be used by RelEng to enforce specific code health checks, for example, or to enable (or disable) debug symbols in all builds</p></li>
+<li><p>all tests run in an isolated build tree, and may not use anything outside the tree, including databases or other backing services</p></li>
+<li><p>any build dependencies must be specified explicitly; for example, which version of Go should be installed in the build environment, or if a project build-depends on another project, which artifacts it needs installed from the other project; explicit is more work, but results in fewer problems due to broken heuristics</p></li>
+</ul></li>
+<li><p>acceptance tests</p>
+<ul>
+<li><p>deploys artifacts from commit stage to containers in special test environments, runs tests against deployed artifacts</p></li>
+<li><p>possibly run slow tests from the build tree as well, if they don’t fit into the commit stage’s time budget</p></li>
+<li><p>this stage can be slower than the commit stage, but should still pass in, say, an hour, instead of taking days</p></li>
+</ul></li>
+<li><p>capacity tests</p>
+<ul>
+<li>these are tests that benchmark the system as deployed into an environment that’s sufficiently production-like also as far as hardware resources are concerned</li>
+</ul></li>
+<li><p>manual (exploratory) tests</p>
+<ul>
+<li><p>testers will have dedicated environments to which they can trigger deployment of specific builds, and in which they can, for example, test that specific bugs are fixed</p></li>
+<li><p>this can also be used to demonstrate upcoming features that are not yet enabled in production</p></li>
+</ul></li>
+</ul>
+<h1 id="architecture-ci-in-an-ecosystem">Architecture: CI in an ecosystem</h1>
+<ul>
+<li><p>code review will be done in Gerrit or otherwise outside the CI pipeline</p></li>
+<li><p>the commit and acceptance stages are triggered as soon as developer pushes changes to be reviewed; human reviews won’t be requested until the two stages pass, as there’s no point in spending human attention on things that are not going to be candidates for deployment to production</p></li>
+<li><p>other stages may run in parallel with code review, but if they fail they may nullify candidacy?</p></li>
+<li><p>deployments go to K8s, everything will run in containers</p></li>
+</ul>
+<h1 id="architecture-internals">Architecture: internals</h1>
+<p>FIXME</p>
+<h1 id="acceptance-criteria">Acceptance criteria</h1>
+<ul>
+<li>This chapter sketches some automated acceptance tests using a Gherkin/Cucumber-like pseudo code language.</li>
+</ul>
+</body>
+</html>
diff --git a/ci-arch.pdf b/ci-arch.pdf
new file mode 100644
index 0000000..4a09f8e
--- /dev/null
+++ b/ci-arch.pdf
Binary files differ