--- title: Ick2, a CI system (architecture) author: Lars Wirzenius date: work-in-progress for ALPHA-1 ... Introduction ============================================================================= Ick2 is a continuous integration (CI) system. It is being developed by Lars Wirzenius and other people, for their own need. It is very early days. You don't want to use Ick2, but if you have opinions on what a CI system should be like, feedback is welcome. This document describes the architecture of Ick2. Specifically, the architecture for the upcoming ALPHA-1 release, not further than that. It is a capital mistake to design software before you have all the requirements. It biases the judgment. You can rarely have all the requirements a priori, you have to iterate to gather them. Designing beyond one iteration is a mistake. Background and justification ----------------------------------------------------------------------------- This section should be written some day. In short, Lars got tired of Jenkins, and all competetitors seem insufficient or somehow unpleasant. Then Daniel suggested a name and Lars is incapable of not starting a project if given a name for it. Overview ----------------------------------------------------------------------------- A continuous integration (CI) or continuous deployment (CD) system is, at its most simple core, an automated system that reacts to changes in a program's source code by doing a build of the program, running any of its automated tests, and then publishing the results somewhere. A CD system continues from there to also installing the new version of the program on all relevant computers. If a build or an automated test fails, the system notifies the relevant parties. Ick2 aims to be a CI/CD system. It deals with a small number of concepts: * **projects**, which consist of **source code** in a version control system (mainly git right now) * **pipelines**, which are sequences of steps aiming to convert source code into something executable, or test the program * **worker build hosts**, which do all the heavy lifting The long-term goal for Ick2 is to provide a CI/CD system that can be used to build and deploy any reasonable software project, including building packages of any reasonable type. In our wildest dreams it'll be scalable enough to build a full, large Linux distribution such as Debian. We'll see. Example ----------------------------------------------------------------------------- We will be returning to this example throughout this document. Imagine a static website that is built using the ikiwiki software. The source of the web pages is stored in a git repo, and the generated HTML pages are published on a web server. This might be expressed as an Ick2 configuration like this: projects: website: workspace: - git: ssh://git@git.example.com/website.git pipelines: - name: getsource steps: - shell: git clone ssh://git@git.example.com/website.git src - name: ikiwiki steps: - shell: mkdir html - shell: ikiwiki src html - name: publish steps: - shell: rsync -a --delete html/. www-user@www.example.com/srv/http/. Note that pipelines are defined in the configuration. Eventually, Ick2 may come with pre-defined libraries of pipelines that can easily be reused, but it will always be possible for users to define their own. Pipeline steps will not be able to use variables, in ALPHA-1. That's probably going to be added later. Ick2 ALPHA-1 ============================================================================= We are currently working on what will be called the ALPHA-1 version of Ick2. This chapter outlines its intended functionality and the shape of its architecture. Ick2 ALPHA-1 definition ----------------------------------------------------------------------------- This is the current working definition of the aim for the ALPHA-1 version of Ick2: > ALPHA-1 of Ick2 can be deployed and configured easily, and can > concurrently build multiple projects using multiple workers. Builds may be > traditional builds from source code, may involve running unit tests > or other build-time tests, may involve building Debian packages, and > build artifacts are published in a suitable location. Builds may > also be builds of static web sites or documentation, and those build > artifacts may be published on suitable web servers. Builds happen on > workers in reasonably well isolated, automatically maintained > environments akin to pbuilder or schroot (meaning the sysadmin is > not expected to set up the pbuilder base tarball, ick2 will do > that). Ick2 acceptance criteria ----------------------------------------------------------------------------- Acceptance criteria for ALPHA-1: * All Ick2 components and the workers are deployable using Ansible or similar configuration management tooling. * At least two people (not only Lars) have set up a CI cluster to build at least two different projects on at least two workers. One of the projects should build documentation for ick2 itself, the other should build a .deb packages of ick2. Bonus points for building other projects than ick2 as well. * Builds get triggered automatically by a git server on any commit to the master branch. * Build logs can be viewed while builds are running or afterwards via an HTTP API (perhaps wrapped in a command line tool). Bonus points if someone builds a web app on top of the API. * A modicum of thought has been spent on security and the major contributors agree the security design is not idiotic. The goal is to be confident that a future version of Ick2 can be made reasonably secure, even if that doesn't happen for ALPHA-1. * The workspace is constructed from several git repositories, e.g., so that the debian subdir comes from a different repo than the main source tree. * The pipeline steps are not merely snippets of shell scripts to run. Instead, steps may name operations that get executed by the workers without specifying the implementation in the Ick2 project configuration. Ick2 ALPHA-1 architecture ----------------------------------------------------------------------------- The future architecture of Ick2 is a collection of mutually recursive self-modifying microservices. * A project consists of a description of the workspace, and one or more pipelines to be executed when triggered to do so. Each pipeline needs to be triggered individually. Each pipeline acts in the same workspace. The entire pipeline is executed on the same worker. * The workspace description is, initially, a set of git repos and corresponding refs to clone (or update from) into a tree. Later (after ALPHA-1) the workspace may be built from multiple git repos, or artifacts of other builds, or other things that turn out to be useful. Accessing git repositories may require credentials that all specific workers will have. * The workspace is, essentially, a directory tree, populated by files needed for doing a build. The "source tree" if you wish. * The project's pipelines do things like: prepare workspace, run actual build, publish build artifacts from worker to a suitable server. The controller keeps track of where in each pipeline a build is. * Workers are represented by worker-managers, which request work from the controller and perform the work by running commands locally or over ssh on the actual worker host. Worker-managers may be on the worker hosts or elsewhere, depending on what suits best for each CI cluster. * Worker-builders register their workers with the controller. For ALPHA-1 all workers are assumed to be equivalent * A pipeline is a sequence of steps (such as shell commands to run), plus some requirements for what attributes the worker that runs the pipeline should have. All the steps of a pipeline get executed by the same worker. * If a pipeline step fails, the controller will mark the pipeline execution as having failed and won't schedule more steps to execute. Likewise, later pipelines in the same project won't be executed. If the failure was transient (e.g., DNS lookup error), the user may trigger a rebuild manually (via the trigger service). ick2 ALPHA-1 components ----------------------------------------------------------------------------- Ick2 consists of several independent services. This document describes how they are used individually and together. * The **controller** keeps track of projects, build pipelines, workers, and the current state of each. It decides which build step is next, and who should execute it. The controller provides a simple, unconditional "build this pipeline" API call, to be used by the trigger service (see below). * A **worker-manager** represents a **build host**. It queries the controller for work, and makes the build host (the actual worker) execute it, and then reports results back to the controller. * The **trigger service** decides when a build should start. It polls the state of the universe, or gets notifications of changes of the same. * The controller and trigger services provide an API. The **identity provider** (IDP) takes care of the authentication of each API client, and what privileges each should have. The API client authenticates itself to the IDP, and receives an access token. The API provider gets the token in each request, validates it, and inspects it to see what the client is allowed to do. A major point of the IDP is to have just a single place where authentication and authorisation is configured. On an implementation level, the various services of Ick2 may be implemented using any language and framework that works. However, to keep things simple. initially we'll be using Python 3, Bottle, and Green Unicorn. Also, the actual API implementation ("backend") will be running behind haproxy, such that haproxy decrypts TLS and sends the actual HTTP requrest over unencrypted localhost connections to the backend. @startuml title Ick2 services [git server] --> [trigger service] : notify of change [trigger service] --> [controller] : start pipeline [controller] <-- [worker manager] : get work, report result [worker manager] --> [host] : execute command [git server] --> [IDP] : get access token [trigger service] .. [IDP] : get access token [worker manager] .. [IDP] : get access token @enduml The API providing services will be running in a configuration like this: @startuml title API arch node service { component haproxy component backend } [API client] --> [haproxy] : HTTPS (TLS) [haproxy] --> [backend] : HTTP over localhost @enduml Individual APIs ============================================================================= This chapter covers interactions with individual APIs. On security ----------------------------------------------------------------------------- All APIs are provided over TLS only. Access tokens are signed using public key encryption and the public part of the signing keys is provided "somehow" to all API clients. Getting an access token ----------------------------------------------------------------------------- The API client (user's command line tool, a putative web app, git server, worker-manager, etc) authenticates itself to the IDP, and if successful, gets back a signed JSON Web Token. It will include the token in all requests to all APIs so that the API provider will know what the client is allowed to do. The privileges for each API client are set by the sysadmin who installs the CI system, or a user who's been given IDP admin privileges by the sysadmin. @startuml hide footbox title Get an access token client -> IDP : GET /auth, with Basic Auth, over https IDP --> client : signed JWT token @enduml All API calls need a token. Getting a token happens the same way for every API client. Worker (worker-manager) registration ----------------------------------------------------------------------------- The sysadmin arranges to start a worker-manager for every build host. They may run on the same host, or not: the Ick2 architecture doesn't really care. If they run on the same host, the worker manager will start a subprocess. If on different hosts, the subprocess will be started using ssh. The CI admin may define tags for each worker. Attributes may include things like whether the worker can be trusted with credentials for logging into other workers, or for retrieving source code from the git server. Workers may not override such tags. Workers may, however, provide other tags, to e.g., report their CPU architecture or Debian release. The controller will eventually be able to use the tags to choose which worker should execute which pipeline steps. @startuml hide footbox title Register worker worker_manager -> IDP : GET /auth, with Basic Auth, over https IDP --> worker_manager : token A worker_manager -> controller : POST /workers (token A) controller --> worker_manager : success @enduml The worker manager runs a very simple state machine. @startuml title Worker-manager state machine Querying : ask controller for work Running : run subprocess [*] -down-> Idle : start Idle -down-> Querying : short timeout has expired Querying -up-> Idle : nothing to do Querying --> Running : something to do Running --> Running : get output, report to controller Running --> Idle : subprocess finished, report to controller @enduml Add project to controller ----------------------------------------------------------------------------- The CI admin (or a user authorised by the CI admin) adds projects to the controller to allow them to be built. This is done using an "CI administration application", which initially will be a command line tool, but may later become a web application as well. Either way, the controller provides API endpoints for this. @startuml hide footbox title Add project to controller adminapp -> IDP : GET /auth, with Basic Auth, over https IDP --> adminapp : token B adminapp -> controller : POST /projects (token B) controller --> adminapp : success or failure indication @enduml A full build ============================================================================= Next we look at how the various compontens interact during a complete build, using a single worker, which is trusted with credentials. We assume the worker has been registered and projects added. The sequence diagrams in this chapter have been split into stages, to make them easier to view and read. Each diagram after the first one continues where the previous one left off. Although not shown in the diagrams, the same sequence is meant to work if having multiple projects running concurrently on multiple workers. Trigger build by pushing changes to git server ----------------------------------------------------------------------------- @startuml hide footbox title Build triggered by git change developer -> gitano : git push gitano -> IDP : GET /auth, with Basic Auth, over https IDP --> gitano : token C gitano -> trigger : POST /git/website.git (token C) note right Git server notifies trigger service that a git repo has changed end note ||| trigger -> IDP : GET /auth, with Basic Auth, over https IDP --> trigger : token D trigger -> controller : GET /projects (token D) note right trigger service queries controller to get list of all projects, so it knows which builds to start end note controller --> trigger : list of projects ||| trigger -> controller : GET /projects/website (token D) note right trigger service gets project config so it knows what pipelines project has end note controller --> trigger : project description, incl. pipelines ||| trigger -> controller : POST /projects/website/pipelines/getsource/+start (token D) @enduml The first pipeline has now been started by the trigger service. Pipeline 1: get sources ----------------------------------------------------------------------------- The first pipeline uses the trusted worker to fetch source code from the git server (we assume that requires credentials), and push them to the powerful worker. @startuml hide footbox title Build pipeline: get source trusty -> IDP : GET /auth, with Basic Auth, over https IDP --> trusty : token E ||| trusty -> controller : GET /worker/trusty (token E) controller --> trusty : "clone website source into workspace" trusty -> gitano : git clone gitano --> trusty : website source code trusty -> controller : POST /worker/trusty, exit=0 (token E) ||| trusty -> controller : GET /worker/trusty (token E) controller -> trusty : "notify trigger service pipeline is finished **successfully**" trusty -> trigger : GET /pipelines/website/getsource, exit=0 (token E) note right No need to have the trigger service query the controller since it has been told the status of pipeline by the worker. end note trusty -> controller : POST /worker/trusty, exit=0 (token E) note right If the notification to the trigger service failed, this can be reported to the controller for logging. end note trigger -> controller : POST /projects/website/pipelines/ikiwiki/+start (token D) @enduml The first pipeline finished, and the website building can start. That's the second pipeline, which has just been started. Pipeline 2: Build static web site ----------------------------------------------------------------------------- The second pipeline runs on the same worker. The source is already there and it just needs to perform the build. @startuml hide footbox title Build static website trusty -> controller : GET /worker/trusty (token E) controller -> trusty : "build static website" trusty -> trusty : run ikiwiki to build site trusty -> controller : POST /worker/trusty, exit=0 (token E) ||| trusty -> controller : GET /worker/trusty (token E) controller -> trusty : "notify trigger service pipeline is finished" trusty -> controller : POST /worker/trusty, exit=0 (token E) trusty -> trigger : GET /pipelines/website/ikiwiki (token E) trigger -> controller : GET /projects/website/pipelines/ikiwiki (token D) trigger -> controller : POST /projects/website/pipelines/publish/+start (token D) @enduml At the end of the second pipeline, we start the third one. Pipeline 3: Publish web site to web server ----------------------------------------------------------------------------- The third pipeline copies the built static website from the trusty worker to the actual web server. @startuml hide footbox title Copy built site from beefy to web server trusty -> controller : GET /worker/trusty (token E) controller -> trusty : "rsync static website to web server" trusty -> webserver : rsync trusty -> controller : POST /worker/trusty, exit=0 (token E) ||| trusty -> controller : GET /worker/trusty (token E) controller --> trusty : "notify trigger service pipeline is finished" trusty -> controller : POST /worker/trusty, exit=0 (token E) trusty -> trigger : GET /pipelines/website/publish (token E) trigger -> controller : GET /projects/website/pipelines/ikiwiki (token D) note right There are no further pipelines. end note @enduml The website is now built and published. Ick APIs ============================================================================= APIs follow the RESTful style ----------------------------------------------------------------------------- All the Ick APIs aRE [RESTful][]. Server-side state is represented by a set of "resources". These data objects that can be addressed using URLs and they are manipulated using HTTP methods (verbs): GET, POST, PUT, DELETE. There can be many instances of a type of resource. These are handled as a collection. Example: given a resource type for projects Ick should build, the API would have the following calls: POST /projects -- create a new project, giving it an ID GET /projects -- get list of all project ids GET /projects/ID -- get info on project ID PUT /projects/ID -- update project ID DELETE /projects/ID -- remove a project [RESTful]: https://en.wikipedia.org/wiki/Representational_state_transfer Resources are all handled the same way, regardless of the type of the resource. This gives a consistency that makes it easier to use the APIs. Note that the server doesn't store any client-side state at all. There are sessions, no logins, etc. Authentication is handled by attaching (in the `Authorization` header) a token to each request. An Identity Provider gives out the tokens to API clients, on request. Note also the API doesn't have RPC style calls. The server end may decide to do some action as a side effect of a resource being created or updated, but the API client can't invoke the action directly. Thus, there's no way to "run this pipeline"; instead, there's a resource showing the state of a pipeline, and changing that resource to say state is "triggered" instead of "idle" is how an API client tells the server to run a pipeline. Ick controller resources and API ----------------------------------------------------------------------------- A project consists of a workspace specification, and an ordered list of pipelines. Additionally the project has a list of builds, and for each build a build log, and metadata (time and duration of build, what triggered it, whether it was successful or not). Also, a current state of the workspace. A project resource: { "project": "liw.fi", "parameters": { "rsync_target": "www-data@www.example.com/srv/http/liw.fi" }, "workspace": { "gits": [ { "git": "ssh://git@git.liw.fi/liw.fi", "branch": "master", "dir": "src" } ] }, "pipelines": [ { "name": "workspace-setup", "actions": [ { "name": "clone-gits" }, ] }, { "name": "ikiwiki-config", "actions": [ { "shell": "cat src/ikiwiki.setup.template > ikiwiki.setup" }, { "shell": "echo \"destdir: {{ workspace }}/html\" >> ikiwiki.setup" }, { "name": "mkdir", "dirname": "html" } ] }, { "name": "ikiwiki-run", "actions": [ { "shell": "ikiwiki --setup ikiwiki.setup" } ] } { "name": "rsync", "actions": [ { "shell": "rsync -a --delete html/. \"{{ rsync_target }}/.\" } ] } ] } Here: - the pipeline consists of a sequence of steps - each step is a shell snippet (expanded with jinja2) or a built-in operation implemented by the worker-manager directly - project parameters are used by steps A pipeline status resource at `/projects/PROJECTNAME/pipelines/PIPELINENAME`, created automatically when a project resource is updated to include the pipeline: { "status": "idle/triggered/running/paused" } To trigger a pipelie, PUT a pipeline resource with a status field of "triggered". It is an error to do that when current status is not idle. A build resource is created automatically, at /projects/PROJECTNAME/builds, when a pipeline actually starts (not when it's triggered). It can't be changed via the API. { "build": "12765", "project": "liw.fi", "pipeline": "ikiwiki-run", "worker": "bartholomew", "status": "running/success/failure", "started": "TIMESTAMP", "ended": "TIMESTAMP", "triggerer": "WHO/WHAT", "trigger": "WHY" } A build log is stored at `/projects/liw.fi/builds/12765/log` as a blob. The build log is appended to by the worker-manager by reporting output. Workers are registered to the controller by creating a worker resource. Later on, we can add useful metadata to the resource, but for now we'll have just the name. { "worker": "bartholomew" } A work resource resource tells a worker what to do next: { "project": "liw.fi", "pipeline": "ikiwiki-run", "step": { "shell": "ikiwiki --setup ikiwiki.setup" }, "parameters": { "rsync-target": "..." } } The controller provides a simple API to give work to each worker: GET /work/bartholomew PUT /work/bartholomew The controller keeps track of which worker is currently running which pipeline Work output resource: { "worker": "bartholomew", "project": "liw.fi", "pipeline": "ikiwiki-run", "exít_code": null, "stdout": "...", "stderr": "...", "timestamp": "..." } When `exit_code` is non-null, the step has finished, and the controller knows it should schedule the next step in the pipeline. Known problems ============================================================================= The architecture shown in this document for ALPHA-1 is not perfect. At least the following things will probably need to be addressed in the future. We've made comromises to gain simplicity and get something working sooner, to allow things to be iterated (faster). * It's not OK for all workers to be trusted with credentials to access all git repositories and all web servers.