<!--

Copyright 2017 Lars Wirzenius

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License
along with this program.  If not, see <http://www.gnu.org/licenses/>.

-->

---
title: Ick2, a CI system (architecture)
author: Lars Wirzenius
date: work-in-progress for ALPHA-1
...


Introduction
=============================================================================

Ick2 is a continuous integration (CI) system. It is being developed by
Lars Wirzenius and other people, for their own need. It is very early
days. You don't want to use Ick2, but if you have opinions on what a
CI system should be like, feedback is welcome.

This document describes the architecture of Ick2. Specifically, the
architecture for the upcoming ALPHA-1 release, not further than that.
It is a capital mistake to design software before you have all the
requirements. It biases the judgment. You can rarely have all the
requirements a priori, you have to iterate to gather them. Designing
beyond one iteration is a mistake.

Background and justification
-----------------------------------------------------------------------------

This section should be written some day. In short, Lars got tired of
Jenkins, and all competetitors seem insufficient or somehow
unpleasant. Then Daniel suggested a name and Lars is incapable of not
starting a project if given a name for it.


Overview
-----------------------------------------------------------------------------

A continuous integration (CI) or continuous deployment (CD) system is,
at its most simple core, an automated system that reacts to changes in
a program's source code by doing a build of the program, running any
of its automated tests, and then publishing the results somewhere. A
CD system continues from there to also installing the new version of
the program on all relevant computers. If a build or an automated test
fails, the system notifies the relevant parties.

Ick2 aims to be a CI/CD system. It deals with a small number of
concepts:

* **projects**, which consist of **source code** in a version control
  system (mainly git right now)
* **pipelines**, which are sequences of steps aiming to convert source
  code into something executable, or test the program
* **worker build hosts**, which do all the heavy lifting

The long-term goal for Ick2 is to provide a CI/CD system that can be
used to build and deploy any reasonable software project, including
building packages of any reasonable type. In our wildest dreams it'll
be scalable enough to build a full, large Linux distribution such as
Debian. We'll see.

Example
-----------------------------------------------------------------------------

We will be returning to this example throughout this document. Imagine
a static website that is built using the ikiwiki software. The source
of the web pages is stored in a git repo, and the generated HTML pages
are published on a web server.

This might be expressed as an Ick2 configuration like this:

    projects:
        website:
            workspace:
                - git: ssh://git@git.example.com/website.git
            pipelines:
                - name: getsource
                  steps:
                  - shell: git clone ssh://git@git.example.com/website.git src
                - name: ikiwiki
                  steps:
                  - shell: mkdir html
                  - shell: ikiwiki src html
                - name: publish
                  steps:
                  - shell: rsync -a --delete html/. www-user@www.example.com/srv/http/.

Note that pipelines are defined in the configuration. Eventually, Ick2
may come with pre-defined libraries of pipelines that can easily be
reused, but it will always be possible for users to define their own.

Pipeline steps will not be able to use variables, in ALPHA-1. That's
probably going to be added later.


Ick2 ALPHA-1
=============================================================================

We are currently working on what will be called the ALPHA-1 version of
Ick2. This chapter outlines its intended functionality and the shape
of its architecture.


Ick2 ALPHA-1 definition
-----------------------------------------------------------------------------

This is the current working definition of the aim for the ALPHA-1
version of Ick2:

> ALPHA-1 of Ick2 can be deployed and configured easily, and can
> concurrently build multiple projects using multiple workers. Builds may be
> traditional builds from source code, may involve running unit tests
> or other build-time tests, may involve building Debian packages, and
> build artifacts are published in a suitable location. Builds may
> also be builds of static web sites or documentation, and those build
> artifacts may be published on suitable web servers. Builds happen on
> workers in reasonably well isolated, automatically maintained
> environments akin to pbuilder or schroot (meaning the sysadmin is
> not expected to set up the pbuilder base tarball, ick2 will do
> that).

Ick2 acceptance criteria
-----------------------------------------------------------------------------

Acceptance criteria for ALPHA-1:

* All Ick2 components and the workers are deployable using Ansible or
  similar configuration management tooling.

* At least two people (not only Lars) have set up a CI cluster to
  build at least two different projects on at least two workers. One
  of the projects should build documentation for ick2 itself, the
  other should build a .deb packages of ick2. Bonus points for
  building other projects than ick2 as well.

* Builds get triggered automatically by a git server on any commit to
  the master branch.

* Build logs can be viewed while builds are running or afterwards via
  an HTTP API (perhaps wrapped in a command line tool). Bonus points
  if someone builds a web app on top of the API.

* A modicum of thought has been spent on security and the major
  contributors agree the security design is not idiotic. The goal is
  to be confident that a future version of Ick2 can be made reasonably
  secure, even if that doesn't happen for ALPHA-1.

* The workspace is constructed from several git repositories, e.g., so
  that the debian subdir comes from a different repo than the main
  source tree.

* The pipeline steps are not merely snippets of shell scripts to run.
  Instead, steps may name operations that get executed by the workers
  without specifying the implementation in the Ick2 project
  configuration.


Ick2 ALPHA-1 architecture
-----------------------------------------------------------------------------

The future architecture of Ick2 is a collection of mutually recursive
self-modifying microservices.

* A project consists of a description of the workspace, and one or
  more pipelines to be executed when triggered to do so. Each
  pipeline needs to be triggered individually. Each pipeline acts in
  the same workspace. The entire pipeline is executed on the same
  worker.

* The workspace description is, initially, a set of git repos and
  corresponding refs to clone (or update from) into a tree. Later
  (after ALPHA-1) the workspace may be built from multiple git repos,
  or artifacts of other builds, or other things that turn out to be
  useful.

  Accessing git repositories may require credentials that all specific
  workers will have.

* The workspace is, essentially, a directory tree, populated by files
  needed for doing a build. The "source tree" if you wish.

* The project's pipelines do things like: prepare workspace, run
  actual build, publish build artifacts from worker to a suitable
  server. The controller keeps track of where in each pipeline a
  build is.

* Workers are represented by worker-managers, which request work
  from the controller and perform the work by running commands locally
  or over ssh on the actual worker host. Worker-managers may be on the
  worker hosts or elsewhere, depending on what suits best for each CI
  cluster.

* Worker-builders register their workers with the controller. For
  ALPHA-1 all workers are assumed to be equivalent

* A pipeline is a sequence of steps (such as shell commands to run),
  plus some requirements for what attributes the worker that runs the
  pipeline should have. All the steps of a pipeline get executed by
  the same worker.

* If a pipeline step fails, the controller will mark the pipeline
  execution as having failed and won't schedule more steps to execute.
  Likewise, later pipelines in the same project won't be executed. If
  the failure was transient (e.g., DNS lookup error), the user may
  trigger a rebuild manually (via the trigger service).

ick2 ALPHA-1 components
-----------------------------------------------------------------------------

Ick2 consists of several independent services. This document describes
how they are used individually and together.

* The **controller** keeps track of projects, build pipelines, workers,
  and the current state of each. It decides which build step is next,
  and who should execute it. The controller provides a simple,
  unconditional "build this pipeline" API call, to be used by the
  trigger service (see below).

* A **worker-manager** represents a **build host**. It queries the
  controller for work, and makes the build host (the actual worker)
  execute it, and then reports results back to the controller.

* The **trigger service** decides when a build should start. It polls
  the state of the universe, or gets notifications of changes of the
  same.

* The controller and trigger services provide an API. The **identity
  provider** (IDP) takes care of the authentication of each API
  client, and what privileges each should have. The API client
  authenticates itself to the IDP, and receives an access token. The
  API provider gets the token in each request, validates it, and
  inspects it to see what the client is allowed to do.

  A major point of the IDP is to have just a single place where
  authentication and authorisation is configured.

On an implementation level, the various services of Ick2 may be
implemented using any language and framework that works. However, to
keep things simple. initially we'll be using Python 3, Bottle, and
Green Unicorn. Also, the actual API implementation ("backend") will be
running behind haproxy, such that haproxy decrypts TLS and sends the
actual HTTP requrest over unencrypted localhost connections to the
backend.

@startuml
title Ick2 services


[git server] --> [trigger service] : notify of change
[trigger service] --> [controller] : start pipeline
[controller] <-- [worker manager] : get work, report result
[worker manager] --> [host] : execute command
[git server] --> [IDP] : get access token
[trigger service] .. [IDP] : get access token
[worker manager] .. [IDP] : get access token
@enduml

The API providing services will be running in a configuration like
this:

@startuml
title API arch
node service {
    component haproxy
    component backend
}
[API client] --> [haproxy] : HTTPS (TLS)
[haproxy] --> [backend] : HTTP over localhost
@enduml


Individual APIs
=============================================================================

This chapter covers interactions with individual APIs.


On security
-----------------------------------------------------------------------------

All APIs are provided over TLS only. Access tokens are signed using public
key encryption and the public part of the signing keys is provided
"somehow" to all API clients.


Getting an access token
-----------------------------------------------------------------------------

The API client (user's command line tool, a putative web app, git
server, worker-manager, etc) authenticates itself to the IDP, and if
successful, gets back a signed JSON Web Token. It will include the
token in all requests to all APIs so that the API provider will know
what the client is allowed to do.

The privileges for each API client are set by the sysadmin who
installs the CI system, or a user who's been given IDP admin
privileges by the sysadmin.

@startuml
hide footbox
title Get an access token
client -> IDP : GET /auth, with Basic Auth, over https
IDP --> client : signed JWT token
@enduml

All API calls need a token. Getting a token happens the same way for
every API client.


Worker (worker-manager) registration
-----------------------------------------------------------------------------

The sysadmin arranges to start a worker-manager for every build host.
They may run on the same host, or not: the Ick2 architecture doesn't
really care. If they run on the same host, the worker manager will
start a subprocess. If on different hosts, the subprocess will be
started using ssh.

The CI admin may define tags for each worker. Attributes may include
things like whether the worker can be trusted with credentials for
logging into other workers, or for retrieving source code from the git
server. Workers may not override such tags. Workers may, however,
provide other tags, to e.g., report their CPU architecture or Debian
release. The controller will eventually be able to use the tags to
choose which worker should execute which pipeline steps.

@startuml
hide footbox
title Register worker
worker_manager -> IDP : GET /auth, with Basic Auth, over https
IDP --> worker_manager : token A
worker_manager -> controller : POST /workers (token A)
controller --> worker_manager : success
@enduml

The worker manager runs a very simple state machine.

@startuml
title Worker-manager state machine

Querying : ask controller for work
Running : run subprocess


[*] -down-> Idle : start
Idle -down-> Querying : short timeout has expired
Querying -up-> Idle : nothing to do
Querying --> Running : something to do

Running --> Running : get output, report to controller
Running --> Idle : subprocess finished, report to controller
@enduml


Add project to controller
-----------------------------------------------------------------------------

The CI admin (or a user authorised by the CI admin) adds projects to
the controller to allow them to be built. This is done using an "CI
administration application", which initially will be a command line
tool, but may later become a web application as well. Either way, the
controller provides API endpoints for this.

@startuml
hide footbox
title Add project to controller

adminapp -> IDP : GET /auth, with Basic Auth, over https
IDP --> adminapp : token B
adminapp -> controller : POST /projects (token B)
controller --> adminapp : success or failure indication
@enduml


A full build
=============================================================================

Next we look at how the various compontens interact during a complete
build, using a single worker, which is trusted with credentials. We
assume the worker has been registered and projects added.

The sequence diagrams in this chapter have been split into stages, to
make them easier to view and read. Each diagram after the first one
continues where the previous one left off.

Although not shown in the diagrams, the same sequence is meant to work
if having multiple projects running concurrently on multiple workers.

Trigger build by pushing changes to git server
-----------------------------------------------------------------------------

@startuml
hide footbox
title Build triggered by git change

developer -> gitano : git push

gitano -> IDP : GET /auth, with Basic Auth, over https
IDP --> gitano : token C
gitano -> trigger : POST /git/website.git (token C)
note right
    Git server notifies
    trigger service that
    a git repo has changed
end note

|||

trigger -> IDP : GET /auth, with Basic Auth, over https
IDP --> trigger : token D
trigger -> controller : GET /projects (token D)
note right
    trigger service queries
    controller to get list
    of all projects, so it
    knows which builds to
    start
end note
controller --> trigger : list of projects

|||

trigger -> controller : GET /projects/website (token D)
note right
    trigger service
    gets project config
    so it knows what
    pipelines project has
end note
controller --> trigger : project description, incl. pipelines

|||

trigger -> controller : POST /projects/website/pipelines/getsource/+start (token D)
@enduml

The first pipeline has now been started by the trigger service.


Pipeline 1: get sources
-----------------------------------------------------------------------------

The first pipeline uses the trusted worker to fetch source code from
the git server (we assume that requires credentials), and push them
to the powerful worker.

@startuml
hide footbox
title Build pipeline: get source

trusty -> IDP : GET /auth, with Basic Auth, over https
IDP --> trusty : token E

|||

trusty -> controller : GET /worker/trusty (token E)
controller --> trusty : "clone website source into workspace"
trusty -> gitano : git clone
gitano --> trusty : website source code
trusty -> controller : POST /worker/trusty, exit=0 (token E)

|||

trusty -> controller : GET /worker/trusty (token E)
controller -> trusty  : "notify trigger service pipeline is finished **successfully**"
trusty -> trigger     : GET /pipelines/website/getsource, exit=0 (token E)
note right
    No need to have the trigger service query the controller since
    it has been told the status of pipeline by the worker.
end note
trusty -> controller  : POST /worker/trusty, exit=0 (token E)
note right
    If the notification to the trigger service failed,
    this can be reported to the controller for logging.
end note
trigger -> controller : POST /projects/website/pipelines/ikiwiki/+start (token D)
@enduml

The first pipeline finished, and the website building can start.
That's the second pipeline, which has just been started.


Pipeline 2: Build static web site
-----------------------------------------------------------------------------

The second pipeline runs on the same worker. The source is already
there and it just needs to perform the build.

@startuml
hide footbox
title Build static website

trusty -> controller : GET /worker/trusty (token E)
controller -> trusty : "build static website"
trusty -> trusty : run ikiwiki to build site
trusty -> controller : POST /worker/trusty, exit=0 (token E)

|||

trusty -> controller : GET /worker/trusty (token E)
controller -> trusty  : "notify trigger service pipeline is finished"
trusty -> controller  : POST /worker/trusty, exit=0 (token E)
trusty -> trigger     : GET /pipelines/website/ikiwiki (token E)
trigger -> controller : GET /projects/website/pipelines/ikiwiki (token D)
trigger -> controller : POST /projects/website/pipelines/publish/+start (token D)

@enduml

At the end of the second pipeline, we start the third one.

Pipeline 3: Publish web site to web server
-----------------------------------------------------------------------------

The third pipeline copies the built static website from the trusty
worker to the actual web server.

@startuml
hide footbox
title Copy built site from beefy to web server

trusty -> controller : GET /worker/trusty (token E)
controller -> trusty : "rsync static website to web server"
trusty -> webserver  : rsync
trusty -> controller : POST /worker/trusty, exit=0 (token E)

|||

trusty -> controller : GET /worker/trusty (token E)
controller --> trusty : "notify trigger service pipeline is finished"
trusty -> controller  : POST /worker/trusty, exit=0 (token E)
trusty -> trigger     : GET /pipelines/website/publish (token E)
trigger -> controller : GET /projects/website/pipelines/ikiwiki (token D)
note right
    There are no further pipelines.
end note

@enduml

The website is now built and published.

Ick APIs
=============================================================================

APIs follow the RESTful style
-----------------------------------------------------------------------------

All the Ick APIs aRE [RESTful][]. Server-side state is represented by
a set of "resources". These data objects that can be addressed using
URLs and they are manipulated using HTTP methods (verbs): GET, POST,
PUT, DELETE. There can be many instances of a type of resource. These
are handled as a collection. Example: given a resource type for
projects Ick should build, the API would have the following calls:

    POST /projects -- create a new project, giving it an ID
    GET /projects -- get list of all project ids
    GET /projects/ID -- get info on project ID
    PUT /projects/ID -- update project ID
    DELETE /projects/ID -- remove a project

[RESTful]: https://en.wikipedia.org/wiki/Representational_state_transfer

Resources are all handled the same way, regardless of the type of the
resource. This gives a consistency that makes it easier to use the
APIs.

Note that the server doesn't store any client-side state at all. There
are sessions, no logins, etc. Authentication is handled by attaching
(in the `Authorization` header) a token to each request. An Identity
Provider gives out the tokens to API clients, on request.

Note also the API doesn't have RPC style calls. The server end may
decide to do some action as a side effect of a resource being created
or updated, but the API client can't invoke the action directly. Thus,
there's no way to "run this pipeline"; instead, there's a resource
showing the state of a pipeline, and changing that resource to say
state is "triggered" instead of "idle" is how an API client tells the
server to run a pipeline.


Ick controller resources and API
-----------------------------------------------------------------------------

A project consists of a workspace specification, and an ordered list
of pipelines. Additionally the project has a list of builds, and for
each build a build log, and metadata (time and duration of build, what
triggered it, whether it was successful or not). Also, a current state
of the workspace.

A project resource:

    {
        "project": "liw.fi",
        "parameters": {
            "rsync_target": "www-data@www.example.com/srv/http/liw.fi"
        },
        "workspace": {
            "gits": [
                {
                    "git": "ssh://git@git.liw.fi/liw.fi",
                    "branch": "master",
                    "dir": "src"
                }
            ]
        },
        "pipelines": [
            {
                "name": "workspace-setup",
                "actions": [
                    { "name": "clone-gits" },
                ]
            },
            {
                "name": "ikiwiki-config",
                "actions": [
                    { "shell": "cat src/ikiwiki.setup.template > ikiwiki.setup" },
                    { "shell": "echo \"destdir: {{ workspace }}/html\" >> ikiwiki.setup" },
                    { "name": "mkdir", "dirname": "html" }
                ]
            },
            {
                "name": "ikiwiki-run",
                "actions": [
                    { "shell": "ikiwiki --setup ikiwiki.setup" }
                ]
            }
            {
                "name": "rsync",
                "actions": [
                    { "shell": "rsync -a --delete html/. \"{{ rsync_target }}/.\" }
                ]
            }
        ]
    }

Here:

- the pipeline consists of a sequence of steps
- each step is a shell snippet (expanded with jinja2) or a built-in
  operation implemented by the worker-manager directly
- project parameters are used by steps

A pipeline status resource at
`/projects/PROJECTNAME/pipelines/PIPELINENAME`, created automatically
when a project resource is updated to include the pipeline:

    {
        "status": "idle/triggered/running/paused"
    }

To trigger a pipelie, PUT a pipeline resource with a status field of
"triggered". It is an error to do that when current status is not
idle.

A build resource is created automatically, at
/projects/PROJECTNAME/builds, when a pipeline actually starts (not
when it's triggered). It can't be changed via the API.

    {
        "build": "12765",
        "project": "liw.fi",
        "pipeline": "ikiwiki-run",
        "worker": "bartholomew",
        "status": "running/success/failure",
        "started": "TIMESTAMP",
        "ended": "TIMESTAMP",
        "triggerer": "WHO/WHAT",
        "trigger": "WHY"
    }

A build log is stored at `/projects/liw.fi/builds/12765/log` as a
blob. The build log is appended to by the worker-manager by reporting
output.

Workers are registered to the controller by creating a worker
resource. Later on, we can add useful metadata to the resource, but
for now we'll have just the name.

    {
        "worker": "bartholomew"
    }

A work resource resource tells a worker what to do next:

    {
        "project": "liw.fi",
        "pipeline": "ikiwiki-run",
        "step": {
            "shell": "ikiwiki --setup ikiwiki.setup"
        },
        "parameters": {
            "rsync-target": "..."
        }
    }

The controller provides a simple API to give work to each worker:

    GET /work/bartholomew
    PUT /work/bartholomew

The controller keeps track of which worker is currently running which
pipeline

Work output resource:

    {
        "worker": "bartholomew",
        "project": "liw.fi",
        "pipeline": "ikiwiki-run",
        "exít_code": null,
        "stdout": "...",
        "stderr": "...",
        "timestamp": "..."
    }

When `exit_code` is non-null, the step has finished, and the
controller knows it should schedule the next step in the pipeline.


Known problems
=============================================================================

The architecture shown in this document for ALPHA-1 is not perfect. At
least the following things will probably need to be addressed in the
future. We've made comromises to gain simplicity and get something
working sooner, to allow things to be iterated (faster).

* It's not OK for all workers to be trusted with credentials to access
  all git repositories and all web servers.