summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorLars Wirzenius <lwirzenius@wikimedia.org>2019-05-02 20:51:22 +0300
committerLars Wirzenius <lwirzenius@wikimedia.org>2019-05-02 20:51:22 +0300
commit9e20584abe41b77696fe7a170dd31a265bc612bf (patch)
treed84a3cdb044855e37e811aa687a5def52faa247b
parent87febe35380e5554fff690663e02a68ff7cbf571 (diff)
downloadwmf-ci-arch-9e20584abe41b77696fe7a170dd31a265bc612bf.tar.gz
Change: clarify thinking about credentials
-rw-r--r--ci-arch.mdwn105
1 files changed, 91 insertions, 14 deletions
diff --git a/ci-arch.mdwn b/ci-arch.mdwn
index 8f17967..f8e0479 100644
--- a/ci-arch.mdwn
+++ b/ci-arch.mdwn
@@ -359,25 +359,102 @@ we plan CI to implement them.
## Log storage
-## Artifact storage
-
-* need to store arbitrary blobs for some time
-
-* longer time for anything that gets deployed to production, shorter
- for everything else?
+* We want to capture the build log or "console output" (stdout,
+ stderr) of the build and store it. This is an invaluable tool for
+ developers to understand what happens in a build, and especially why
+ it failed.
-* de-duplicate to save on space?
+* Ideally, the build log is formatted in a way that's easy for humans
+ to read.
-* can these be publically accessible? sometimes not?
+* It'd also be nice if the build log can be easily processed
+ programmatically, to extract information from it automatically.
-* artifact storage must be secure, as everything that gets deployed to
- production goes via it
+* We may want to store build logs for extended periods of time so that
+ we can analyze them later. By storing them in a de-duplicating and
+ compressing manner, the way backup software like Borg does, the
+ storage requirements can be kept reasonable.
-## Credentials management
-
-* what are the requirements and use cases here?
+## Artifact storage
-* deployment to K8s vs to bare metal servers?
+* Artifacts are all the files created during the build process that
+ may be needed for automated testing or deployment to production or
+ any other environment: executable binaries, minimized Javascript,
+ automatically generated documentation from source code (javadoc).
+
+* We basically need to store arbitrary blobs for some time. We need to
+ retrieve the blobs for deployment, and possibly other reasons.
+
+* We may want to store artifacts that get deployed to production for a
+ longer time than other artifacts so that we can keep a history what
+ was in production at any recent-ish point in time.
+
+* We will want to trace back from each artifact which git repository
+ and commit it came from.
+
+* We can de-duplicate artifacts (a la backup programs) to save on
+ space. Even so, we will want to automatically expire artifacts on
+ some flexible schedule to keep storage needs in control.
+
+* We need to decide when we can make these artifacts publically
+ accessible.
+
+* Artifact storage must be secure, as everything that gets deployed to
+ production goes via it.
+
+* There are some artifact storage systems we can use.
+
+## Credentials management and access control
+
+* Credentials and other secrets are used to allow access to servers,
+ services, and files. They are often highly security sensitive data.
+ The CI system needs to protect them, but allow controlled use of
+ them.
+
+* Example: a CI job needs to deploy a Docker image with a tested and
+ reviewed change as a container orchestrated by Kubernetes. For this,
+ it needs to authenticate itself to the Kubernetes API. This is
+ typically done by a username/password combination. How will the
+ future CI system handle this?
+
+* Example: for tests, and in production, a MediaWiki container needs
+ access to a MariaDB database, and MW needs to authenticate itself to
+ the database. MW gets the necessary credentials for this from its
+ configuration, which CI will install during deployment. The
+ configuration will be specific for what the container is being used:
+ if it's for testing a change, the configuration only allows access
+ to a test database, but for production it provides access to the
+ production database.
+
+* FIXME: This is unclear as yet, the text below is some incoherent
+ preliminary rambling by Lars which needs review and fixing.
+
+* Builds are done in isolated containers. These containers have no
+ credentials. Build artifacts are extracted from the containers and
+ stored in an artifact storage system by the CI system, and this is
+ done in a controlled environment, where only vetted code is run, not
+ code from the repository being tested.
+
+* Deployments happen in controlled environments, with access to the
+ credentials needed for deployment. The deployment retrieve artifacts
+ from the artifact storage system. The deployments are to containers,
+ and the deployed continers don't have any credentials, unless CI has
+ been configured to install them, in which case CI installs the
+ credentials for the intended use of the container.
+
+* Tests run against software deployed to containers, and those
+ containers only have access to the backing services needed for the
+ test.
+
+* The CI system needs a way to store the credentials that can only be
+ accessed by CI itself, when it's deploying a container (Kubernetes
+ API access) or configuring the container (installing credentials for
+ the intended use of container).
+
+ This might be, for example, a set of files deployed to the CI host
+ where container deployment or configuration runs, with access
+ control provided by Unix permissions. Not sure if this is
+ sufficiently secure.
## Interdependent changes to multiple components