Add: architecture page

author: Lars Wirzenius <liw@liw.fi> 2017-12-18 18:06:11 +0200
committer: Lars Wirzenius <liw@liw.fi> 2017-12-18 18:06:11 +0200
commit: f212f59ef5112cdafe31fbb4ee47b0c362129fc5 (patch)
tree: 1bc1b4f7e7ac3fc1ea9fe7e13a208b78472b8fbb
parent: 06fc725d69bef53e78b7be88df3be1cdff8b7930 (diff)
download: ick.liw.fi-f212f59ef5112cdafe31fbb4ee47b0c362129fc5.tar.gz
2 files changed, 720 insertions, 0 deletions
diff --git a/architecture.mdwn b/architecture.mdwn
new file mode 100644
index 0000000..b024c30
--- /dev/null
+++ b/architecture.mdwn
@@ -0,0 +1,719 @@
+[[!meta title="Ick&mdash;architecture"]]
+
+Introduction
+=============================================================================
+
+Ick2 is a continuous integration (CI) system. It is being developed by
+Lars Wirzenius and other people, for their own need. It is very early
+days. You don't want to use Ick2, but if you have opinions on what a
+CI system should be like, feedback is welcome.
+
+This document describes the architecture of Ick2. Specifically, the
+architecture for the upcoming ALPHA-1 release, not further than that.
+It is a capital mistake to design software before you have all the
+requirements. It biases the judgment. You can rarely have all the
+requirements a priori, you have to iterate to gather them. Designing
+beyond one iteration is a mistake.
+
+Background and justification
+-----------------------------------------------------------------------------
+
+This section should be written some day. In short, Lars got tired of
+Jenkins, and all competetitors seem insufficient or somehow
+unpleasant. Then Daniel suggested a name and Lars is incapable of not
+starting a project if given a name for it.
+
+
+Overview
+-----------------------------------------------------------------------------
+
+A continuous integration (CI) or continuous deployment (CD) system is,
+at its most simple core, an automated system that reacts to changes in
+a program's source code by doing a build of the program, running any
+of its automated tests, and then publishing the results somewhere. A
+CD system continues from there to also installing the new version of
+the program on all relevant computers. If a build or an automated test
+fails, the system notifies the relevant parties.
+
+Ick2 aims to be a CI/CD system. It deals with a small number of
+concepts:
+
+* **projects**, which consist of **source code** in a version control
+  system (mainly git right now)
+* **pipelines**, which are sequences of steps aiming to convert source
+  code into something executable, or test the program
+* **worker build hosts**, which do all the heavy lifting
+
+The long-term goal for Ick2 is to provide a CI/CD system that can be
+used to build and deploy any reasonable software project, including
+building packages of any reasonable type. In our wildest dreams it'll
+be scalable enough to build a full, large Linux distribution such as
+Debian. We'll see.
+
+Example
+-----------------------------------------------------------------------------
+
+We will be returning to this example throughout this document. Imagine
+a static website that is built using the ikiwiki software. The source
+of the web pages is stored in a git repo, and the generated HTML pages
+are published on a web server.
+
+This might be expressed as an Ick2 configuration like this:
+
+    projects:
+        website:
+            workspace:
+                - git: ssh://git@git.example.com/website.git
+            pipelines:
+                - name: getsource
+                  steps:
+                  - shell: git clone ssh://git@git.example.com/website.git src
+                - name: ikiwiki
+                  steps:
+                  - shell: mkdir html
+                  - shell: ikiwiki src html
+                - name: publish
+                  steps:
+                  - shell: rsync -a --delete html/. www-user@www.example.com/srv/http/.
+
+Note that pipelines are defined in the configuration. Eventually, Ick2
+may come with pre-defined libraries of pipelines that can easily be
+reused, but it will always be possible for users to define their own.
+
+Pipeline steps will not be able to use variables, in ALPHA-1. That's
+probably going to be added later.
+
+
+Ick2 ALPHA-1
+=============================================================================
+
+We are currently working on what will be called the ALPHA-1 version of
+Ick2. This chapter outlines its intended functionality and the shape
+of its architecture.
+
+
+Ick2 ALPHA-1 definition
+-----------------------------------------------------------------------------
+
+This is the current working definition of the aim for the ALPHA-1
+version of Ick2:
+
+> ALPHA-1 of Ick2 can be deployed and configured easily, and can
+> concurrently build multiple projects using multiple workers. Builds may be
+> traditional builds from source code, may involve running unit tests
+> or other build-time tests, may involve building Debian packages, and
+> build artifacts are published in a suitable location. Builds may
+> also be builds of static web sites or documentation, and those build
+> artifacts may be published on suitable web servers. Builds happen on
+> workers in reasonably well isolated, automatically maintained
+> environments akin to pbuilder or schroot (meaning the sysadmin is
+> not expected to set up the pbuilder base tarball, ick2 will do
+> that).
+
+Ick2 acceptance criteria
+-----------------------------------------------------------------------------
+
+Acceptance criteria for ALPHA-1:
+
+* All Ick2 components and the workers are deployable using Ansible or
+  similar configuration management tooling.
+
+* At least two people (not only Lars) have set up a CI cluster to
+  build at least two different projects on at least two workers. One
+  of the projects should build documentation for ick2 itself, the
+  other should build a .deb packages of ick2. Bonus points for
+  building other projects than ick2 as well.
+
+* Builds get triggered automatically by a git server on any commit to
+  the master branch.
+
+* Build logs can be viewed while builds are running or afterwards via
+  an HTTP API (perhaps wrapped in a command line tool). Bonus points
+  if someone builds a web app on top of the API.
+
+* A modicum of thought has been spent on security and the major
+  contributors agree the security design is not idiotic. The goal is
+  to be confident that a future version of Ick2 can be made reasonably
+  secure, even if that doesn't happen for ALPHA-1.
+
+* The workspace is constructed from several git repositories, e.g., so
+  that the debian subdir comes from a different repo than the main
+  source tree.
+
+* The pipeline steps are not merely snippets of shell scripts to run.
+  Instead, steps may name operations that get executed by the workers
+  without specifying the implementation in the Ick2 project
+  configuration.
+
+
+Ick2 ALPHA-1 architecture
+-----------------------------------------------------------------------------
+
+The future architecture of Ick2 is a collection of mutually recursive
+self-modifying microservices.
+
+* A project consists of a description of the workspace, and one or
+  more pipelines to be executed when triggered to do so. Each
+  pipeline needs to be triggered individually. Each pipeline acts in
+  the same workspace. The entire pipeline is executed on the same
+  worker.
+
+* The workspace description is, initially, a set of git repos and
+  corresponding refs to clone (or update from) into a tree. Later
+  (after ALPHA-1) the workspace may be built from multiple git repos,
+  or artifacts of other builds, or other things that turn out to be
+  useful.
+
+  Accessing git repositories may require credentials that all specific
+  workers will have.
+
+* The workspace is, essentially, a directory tree, populated by files
+  needed for doing a build. The "source tree" if you wish.
+
+* The project's pipelines do things like: prepare workspace, run
+  actual build, publish build artifacts from worker to a suitable
+  server. The controller keeps track of where in each pipeline a
+  build is.
+
+* Workers are represented by worker-managers, which request work
+  from the controller and perform the work by running commands locally
+  or over ssh on the actual worker host. Worker-managers may be on the
+  worker hosts or elsewhere, depending on what suits best for each CI
+  cluster.
+
+* Worker-builders register their workers with the controller. For
+  ALPHA-1 all workers are assumed to be equivalent
+
+* A pipeline is a sequence of steps (such as shell commands to run),
+  plus some requirements for what attributes the worker that runs the
+  pipeline should have. All the steps of a pipeline get executed by
+  the same worker.
+
+* If a pipeline step fails, the controller will mark the pipeline
+  execution as having failed and won't schedule more steps to execute.
+  Likewise, later pipelines in the same project won't be executed. If
+  the failure was transient (e.g., DNS lookup error), the user may
+  trigger a rebuild manually (via the trigger service).
+
+ick2 ALPHA-1 components
+-----------------------------------------------------------------------------
+
+Ick2 consists of several independent services. This document describes
+how they are used individually and together.
+
+* The **controller** keeps track of projects, build pipelines, workers,
+  and the current state of each. It decides which build step is next,
+  and who should execute it. The controller provides a simple,
+  unconditional "build this pipeline" API call, to be used by the
+  trigger service (see below).
+
+* A **worker-manager** represents a **build host**. It queries the
+  controller for work, and makes the build host (the actual worker)
+  execute it, and then reports results back to the controller.
+
+* The **trigger service** decides when a build should start. It polls
+  the state of the universe, or gets notifications of changes of the
+  same.
+
+* The controller and trigger services provide an API. The **identity
+  provider** (IDP) takes care of the authentication of each API
+  client, and what privileges each should have. The API client
+  authenticates itself to the IDP, and receives an access token. The
+  API provider gets the token in each request, validates it, and
+  inspects it to see what the client is allowed to do.
+
+  A major point of the IDP is to have just a single place where
+  authentication and authorisation is configured.
+
+On an implementation level, the various services of Ick2 may be
+implemented using any language and framework that works. However, to
+keep things simple. initially we'll be using Python 3, Bottle, and
+Green Unicorn. Also, the actual API implementation ("backend") will be
+running behind haproxy, such that haproxy decrypts TLS and sends the
+actual HTTP requrest over unencrypted localhost connections to the
+backend.
+
+@startuml
+title Ick2 services
+
+
+[git server] --> [trigger service] : notify of change
+[trigger service] --> [controller] : start pipeline
+[controller] <-- [worker manager] : get work, report result
+[worker manager] --> [host] : execute command
+[git server] --> [IDP] : get access token
+[trigger service] .. [IDP] : get access token
+[worker manager] .. [IDP] : get access token
+@enduml
+
+The API providing services will be running in a configuration like
+this:
+
+@startuml
+title API arch
+node service {
+    component haproxy
+    component backend
+}
+[API client] --> [haproxy] : HTTPS (TLS)
+[haproxy] --> [backend] : HTTP over localhost
+@enduml
+
+
+Individual APIs
+=============================================================================
+
+This chapter covers interactions with individual APIs.
+
+
+On security
+-----------------------------------------------------------------------------
+
+All APIs are provided over TLS only. Access tokens are signed using public
+key encryption and the public part of the signing keys is provided
+"somehow" to all API clients.
+
+
+Getting an access token
+-----------------------------------------------------------------------------
+
+The API client (user's command line tool, a putative web app, git
+server, worker-manager, etc) authenticates itself to the IDP, and if
+successful, gets back a signed JSON Web Token. It will include the
+token in all requests to all APIs so that the API provider will know
+what the client is allowed to do.
+
+The privileges for each API client are set by the sysadmin who
+installs the CI system, or a user who's been given IDP admin
+privileges by the sysadmin.
+
+@startuml
+hide footbox
+title Get an access token
+client -> IDP : GET /auth, with Basic Auth, over https
+IDP --> client : signed JWT token
+@enduml
+
+All API calls need a token. Getting a token happens the same way for
+every API client.
+
+
+Worker (worker-manager) registration
+-----------------------------------------------------------------------------
+
+The sysadmin arranges to start a worker-manager for every build host.
+They may run on the same host, or not: the Ick2 architecture doesn't
+really care. If they run on the same host, the worker manager will
+start a subprocess. If on different hosts, the subprocess will be
+started using ssh.
+
+The CI admin may define tags for each worker. Attributes may include
+things like whether the worker can be trusted with credentials for
+logging into other workers, or for retrieving source code from the git
+server. Workers may not override such tags. Workers may, however,
+provide other tags, to e.g., report their CPU architecture or Debian
+release. The controller will eventually be able to use the tags to
+choose which worker should execute which pipeline steps.
+
+@startuml
+hide footbox
+title Register worker
+worker_manager -> IDP : GET /auth, with Basic Auth, over https
+IDP --> worker_manager : token A
+worker_manager -> controller : POST /workers (token A)
+controller --> worker_manager : success
+@enduml
+
+The worker manager runs a very simple state machine.
+
+@startuml
+title Worker-manager state machine
+
+Querying : ask controller for work
+Running : run subprocess
+
+
+[*] -down-> Idle : start
+Idle -down-> Querying : short timeout has expired
+Querying -up-> Idle : nothing to do
+Querying --> Running : something to do
+
+Running --> Running : get output, report to controller
+Running --> Idle : subprocess finished, report to controller
+@enduml
+
+
+Add project to controller
+-----------------------------------------------------------------------------
+
+The CI admin (or a user authorised by the CI admin) adds projects to
+the controller to allow them to be built. This is done using an "CI
+administration application", which initially will be a command line
+tool, but may later become a web application as well. Either way, the
+controller provides API endpoints for this.
+
+@startuml
+hide footbox
+title Add project to controller
+
+adminapp -> IDP : GET /auth, with Basic Auth, over https
+IDP --> adminapp : token B
+adminapp -> controller : POST /projects (token B)
+controller --> adminapp : success or failure indication
+@enduml
+
+
+A full build
+=============================================================================
+
+Next we look at how the various compontens interact during a complete
+build, using a single worker, which is trusted with credentials. We
+assume the worker has been registered and projects added.
+
+The sequence diagrams in this chapter have been split into stages, to
+make them easier to view and read. Each diagram after the first one
+continues where the previous one left off.
+
+Although not shown in the diagrams, the same sequence is meant to work
+if having multiple projects running concurrently on multiple workers.
+
+Trigger build by pushing changes to git server
+-----------------------------------------------------------------------------
+
+@startuml
+hide footbox
+title Build triggered by git change
+
+developer -> gitano : git push
+
+gitano -> IDP : GET /auth, with Basic Auth, over https
+IDP --> gitano : token C
+gitano -> trigger : POST /git/website.git (token C)
+note right
+    Git server notifies
+    trigger service that
+    a git repo has changed
+end note
+
+|||
+
+trigger -> IDP : GET /auth, with Basic Auth, over https
+IDP --> trigger : token D
+trigger -> controller : GET /projects (token D)
+note right
+    trigger service queries
+    controller to get list
+    of all projects, so it
+    knows which builds to
+    start
+end note
+controller --> trigger : list of projects
+
+|||
+
+trigger -> controller : GET /projects/website (token D)
+note right
+    trigger service
+    gets project config
+    so it knows what
+    pipelines project has
+end note
+controller --> trigger : project description, incl. pipelines
+
+|||
+
+trigger -> controller : POST /projects/website/pipelines/getsource/+start (token D)
+@enduml
+
+The first pipeline has now been started by the trigger service.
+
+
+Pipeline 1: get sources
+-----------------------------------------------------------------------------
+
+The first pipeline uses the trusted worker to fetch source code from
+the git server (we assume that requires credentials), and push them
+to the powerful worker.
+
+@startuml
+hide footbox
+title Build pipeline: get source
+
+trusty -> IDP : GET /auth, with Basic Auth, over https
+IDP --> trusty : token E
+
+|||
+
+trusty -> controller : GET /worker/trusty (token E)
+controller --> trusty : "clone website source into workspace"
+trusty -> gitano : git clone
+gitano --> trusty : website source code
+trusty -> controller : POST /worker/trusty, exit=0 (token E)
+
+|||
+
+trusty -> controller : GET /worker/trusty (token E)
+controller -> trusty  : "notify trigger service pipeline is finished **successfully**"
+trusty -> trigger     : GET /pipelines/website/getsource, exit=0 (token E)
+note right
+    No need to have the trigger service query the controller since
+    it has been told the status of pipeline by the worker.
+end note
+trusty -> controller  : POST /worker/trusty, exit=0 (token E)
+note right
+    If the notification to the trigger service failed,
+    this can be reported to the controller for logging.
+end note
+trigger -> controller : POST /projects/website/pipelines/ikiwiki/+start (token D)
+@enduml
+
+The first pipeline finished, and the website building can start.
+That's the second pipeline, which has just been started.
+
+
+Pipeline 2: Build static web site
+-----------------------------------------------------------------------------
+
+The second pipeline runs on the same worker. The source is already
+there and it just needs to perform the build.
+
+@startuml
+hide footbox
+title Build static website
+
+trusty -> controller : GET /worker/trusty (token E)
+controller -> trusty : "build static website"
+trusty -> trusty : run ikiwiki to build site
+trusty -> controller : POST /worker/trusty, exit=0 (token E)
+
+|||
+
+trusty -> controller : GET /worker/trusty (token E)
+controller -> trusty  : "notify trigger service pipeline is finished"
+trusty -> controller  : POST /worker/trusty, exit=0 (token E)
+trusty -> trigger     : GET /pipelines/website/ikiwiki (token E)
+trigger -> controller : GET /projects/website/pipelines/ikiwiki (token D)
+trigger -> controller : POST /projects/website/pipelines/publish/+start (token D)
+
+@enduml
+
+At the end of the second pipeline, we start the third one.
+
+Pipeline 3: Publish web site to web server
+-----------------------------------------------------------------------------
+
+The third pipeline copies the built static website from the trusty
+worker to the actual web server.
+
+@startuml
+hide footbox
+title Copy built site from beefy to web server
+
+trusty -> controller : GET /worker/trusty (token E)
+controller -> trusty : "rsync static website to web server"
+trusty -> webserver  : rsync
+trusty -> controller : POST /worker/trusty, exit=0 (token E)
+
+|||
+
+trusty -> controller : GET /worker/trusty (token E)
+controller --> trusty : "notify trigger service pipeline is finished"
+trusty -> controller  : POST /worker/trusty, exit=0 (token E)
+trusty -> trigger     : GET /pipelines/website/publish (token E)
+trigger -> controller : GET /projects/website/pipelines/ikiwiki (token D)
+note right
+    There are no further pipelines.
+end note
+
+@enduml
+
+The website is now built and published.
+
+Ick APIs
+=============================================================================
+
+APIs follow the RESTful style
+-----------------------------------------------------------------------------
+
+All the Ick APIs aRE [RESTful][]. Server-side state is represented by
+a set of "resources". These data objects that can be addressed using
+URLs and they are manipulated using HTTP methods (verbs): GET, POST,
+PUT, DELETE. There can be many instances of a type of resource. These
+are handled as a collection. Example: given a resource type for
+projects Ick should build, the API would have the following calls:
+
+    POST /projects -- create a new project, giving it an ID
+    GET /projects -- get list of all project ids
+    GET /projects/ID -- get info on project ID
+    PUT /projects/ID -- update project ID
+    DELETE /projects/ID -- remove a project
+
+[RESTful]: https://en.wikipedia.org/wiki/Representational_state_transfer
+
+Resources are all handled the same way, regardless of the type of the
+resource. This gives a consistency that makes it easier to use the
+APIs.
+
+Note that the server doesn't store any client-side state at all. There
+are sessions, no logins, etc. Authentication is handled by attaching
+(in the `Authorization` header) a token to each request. An Identity
+Provider gives out the tokens to API clients, on request.
+
+Note also the API doesn't have RPC style calls. The server end may
+decide to do some action as a side effect of a resource being created
+or updated, but the API client can't invoke the action directly. Thus,
+there's no way to "run this pipeline"; instead, there's a resource
+showing the state of a pipeline, and changing that resource to say
+state is "triggered" instead of "idle" is how an API client tells the
+server to run a pipeline.
+
+
+Ick controller resources and API
+-----------------------------------------------------------------------------
+
+A project consists of a workspace specification, and an ordered list
+of pipelines. Additionally the project has a list of builds, and for
+each build a build log, and metadata (time and duration of build, what
+triggered it, whether it was successful or not). Also, a current state
+of the workspace.
+
+A project resource:
+
+    {
+        "project": "liw.fi",
+        "parameters": {
+            "rsync_target": "www-data@www.example.com/srv/http/liw.fi"
+        },
+        "workspace": {
+            "gits": [
+                {
+                    "git": "ssh://git@git.liw.fi/liw.fi",
+                    "branch": "master",
+                    "dir": "src"
+                }
+            ]
+        },
+        "pipelines": [
+            {
+                "name": "workspace-setup",
+                "actions": [
+                    { "name": "clone-gits" },
+                ]
+            },
+            {
+                "name": "ikiwiki-config",
+                "actions": [
+                    { "shell": "cat src/ikiwiki.setup.template > ikiwiki.setup" },
+                    { "shell": "echo \"destdir: {{ workspace }}/html\" >> ikiwiki.setup" },
+                    { "name": "mkdir", "dirname": "html" }
+                ]
+            },
+            {
+                "name": "ikiwiki-run",
+                "actions": [
+                    { "shell": "ikiwiki --setup ikiwiki.setup" }
+                ]
+            }
+            {
+                "name": "rsync",
+                "actions": [
+                    { "shell": "rsync -a --delete html/. \"{{ rsync_target }}/.\" }
+                ]
+            }
+        ]
+    }
+
+Here:
+
+- the pipeline consists of a sequence of steps
+- each step is a shell snippet (expanded with jinja2) or a built-in
+  operation implemented by the worker-manager directly
+- project parameters are used by steps
+
+A pipeline status resource at
+`/projects/PROJECTNAME/pipelines/PIPELINENAME`, created automatically
+when a project resource is updated to include the pipeline:
+
+    {
+        "status": "idle/triggered/running/paused"
+    }
+
+To trigger a pipelie, PUT a pipeline resource with a status field of
+"triggered". It is an error to do that when current status is not
+idle.
+
+A build resource is created automatically, at
+/projects/PROJECTNAME/builds, when a pipeline actually starts (not
+when it's triggered). It can't be changed via the API.
+
+    {
+        "build": "12765",
+        "project": "liw.fi",
+        "pipeline": "ikiwiki-run",
+        "worker": "bartholomew",
+        "status": "running/success/failure",
+        "started": "TIMESTAMP",
+        "ended": "TIMESTAMP",
+        "triggerer": "WHO/WHAT",
+        "trigger": "WHY"
+    }
+
+A build log is stored at `/projects/liw.fi/builds/12765/log` as a
+blob. The build log is appended to by the worker-manager by reporting
+output.
+
+Workers are registered to the controller by creating a worker
+resource. Later on, we can add useful metadata to the resource, but
+for now we'll have just the name.
+
+    {
+        "worker": "bartholomew"
+    }
+
+A work resource resource tells a worker what to do next:
+
+    {
+        "project": "liw.fi",
+        "pipeline": "ikiwiki-run",
+        "step": {
+            "shell": "ikiwiki --setup ikiwiki.setup"
+        },
+        "parameters": {
+            "rsync-target": "..."
+        }
+    }
+
+The controller provides a simple API to give work to each worker:
+
+    GET /work/bartholomew
+    PUT /work/bartholomew
+
+The controller keeps track of which worker is currently running which
+pipeline
+
+Work output resource:
+
+    {
+        "worker": "bartholomew",
+        "project": "liw.fi",
+        "pipeline": "ikiwiki-run",
+        "exít_code": null,
+        "stdout": "...",
+        "stderr": "...",
+        "timestamp": "..."
+    }
+
+When `exit_code` is non-null, the step has finished, and the
+controller knows it should schedule the next step in the pipeline.
+
+
+Known problems
+=============================================================================
+
+The architecture shown in this document for ALPHA-1 is not perfect. At
+least the following things will probably need to be addressed in the
+future. We've made comromises to gain simplicity and get something
+working sooner, to allow things to be iterated (faster).
+
+* It's not OK for all workers to be trusted with credentials to access
+  all git repositories and all web servers.
diff --git a/index.mdwn b/index.mdwn
index f571494..ed6a6bf 100644
--- a/index.mdwn
+++ b/index.mdwn
@@ -64,6 +64,7 @@ Latest meetings:
 
 More info:
 
+* [[Architecture]]
 * [[Process desciption|pm]]
   * [[Current and past projects|projects]]
   * [[Tasks to do if you want to help|tasks]]
author	Lars Wirzenius <liw@liw.fi>	2017-12-18 18:06:11 +0200
committer	Lars Wirzenius <liw@liw.fi>	2017-12-18 18:06:11 +0200
commit	f212f59ef5112cdafe31fbb4ee47b0c362129fc5 (patch)
tree	1bc1b4f7e7ac3fc1ea9fe7e13a208b78472b8fbb
parent	06fc725d69bef53e78b7be88df3be1cdff8b7930 (diff)
download	ick.liw.fi-f212f59ef5112cdafe31fbb4ee47b0c362129fc5.tar.gz