diff options
author | Lars Wirzenius <lwirzenius@wikimedia.org> | 2019-05-02 20:51:22 +0300 |
---|---|---|
committer | Lars Wirzenius <lwirzenius@wikimedia.org> | 2019-05-02 20:51:22 +0300 |
commit | 9e20584abe41b77696fe7a170dd31a265bc612bf (patch) | |
tree | d84a3cdb044855e37e811aa687a5def52faa247b /ci-arch.mdwn | |
parent | 87febe35380e5554fff690663e02a68ff7cbf571 (diff) | |
download | wmf-ci-arch-9e20584abe41b77696fe7a170dd31a265bc612bf.tar.gz |
Change: clarify thinking about credentials
Diffstat (limited to 'ci-arch.mdwn')
-rw-r--r-- | ci-arch.mdwn | 105 |
1 files changed, 91 insertions, 14 deletions
diff --git a/ci-arch.mdwn b/ci-arch.mdwn index 8f17967..f8e0479 100644 --- a/ci-arch.mdwn +++ b/ci-arch.mdwn @@ -359,25 +359,102 @@ we plan CI to implement them. ## Log storage -## Artifact storage - -* need to store arbitrary blobs for some time - -* longer time for anything that gets deployed to production, shorter - for everything else? +* We want to capture the build log or "console output" (stdout, + stderr) of the build and store it. This is an invaluable tool for + developers to understand what happens in a build, and especially why + it failed. -* de-duplicate to save on space? +* Ideally, the build log is formatted in a way that's easy for humans + to read. -* can these be publically accessible? sometimes not? +* It'd also be nice if the build log can be easily processed + programmatically, to extract information from it automatically. -* artifact storage must be secure, as everything that gets deployed to - production goes via it +* We may want to store build logs for extended periods of time so that + we can analyze them later. By storing them in a de-duplicating and + compressing manner, the way backup software like Borg does, the + storage requirements can be kept reasonable. -## Credentials management - -* what are the requirements and use cases here? +## Artifact storage -* deployment to K8s vs to bare metal servers? +* Artifacts are all the files created during the build process that + may be needed for automated testing or deployment to production or + any other environment: executable binaries, minimized Javascript, + automatically generated documentation from source code (javadoc). + +* We basically need to store arbitrary blobs for some time. We need to + retrieve the blobs for deployment, and possibly other reasons. + +* We may want to store artifacts that get deployed to production for a + longer time than other artifacts so that we can keep a history what + was in production at any recent-ish point in time. + +* We will want to trace back from each artifact which git repository + and commit it came from. + +* We can de-duplicate artifacts (a la backup programs) to save on + space. Even so, we will want to automatically expire artifacts on + some flexible schedule to keep storage needs in control. + +* We need to decide when we can make these artifacts publically + accessible. + +* Artifact storage must be secure, as everything that gets deployed to + production goes via it. + +* There are some artifact storage systems we can use. + +## Credentials management and access control + +* Credentials and other secrets are used to allow access to servers, + services, and files. They are often highly security sensitive data. + The CI system needs to protect them, but allow controlled use of + them. + +* Example: a CI job needs to deploy a Docker image with a tested and + reviewed change as a container orchestrated by Kubernetes. For this, + it needs to authenticate itself to the Kubernetes API. This is + typically done by a username/password combination. How will the + future CI system handle this? + +* Example: for tests, and in production, a MediaWiki container needs + access to a MariaDB database, and MW needs to authenticate itself to + the database. MW gets the necessary credentials for this from its + configuration, which CI will install during deployment. The + configuration will be specific for what the container is being used: + if it's for testing a change, the configuration only allows access + to a test database, but for production it provides access to the + production database. + +* FIXME: This is unclear as yet, the text below is some incoherent + preliminary rambling by Lars which needs review and fixing. + +* Builds are done in isolated containers. These containers have no + credentials. Build artifacts are extracted from the containers and + stored in an artifact storage system by the CI system, and this is + done in a controlled environment, where only vetted code is run, not + code from the repository being tested. + +* Deployments happen in controlled environments, with access to the + credentials needed for deployment. The deployment retrieve artifacts + from the artifact storage system. The deployments are to containers, + and the deployed continers don't have any credentials, unless CI has + been configured to install them, in which case CI installs the + credentials for the intended use of the container. + +* Tests run against software deployed to containers, and those + containers only have access to the backing services needed for the + test. + +* The CI system needs a way to store the credentials that can only be + accessed by CI itself, when it's deploying a container (Kubernetes + API access) or configuring the container (installing credentials for + the intended use of container). + + This might be, for example, a set of files deployed to the CI host + where container deployment or configuration runs, with access + control provided by Unix permissions. Not sure if this is + sufficiently secure. ## Interdependent changes to multiple components |