Change: clarify thinking about credentials

author: Lars Wirzenius <lwirzenius@wikimedia.org> 2019-05-02 20:51:22 +0300
committer: Lars Wirzenius <lwirzenius@wikimedia.org> 2019-05-02 20:51:22 +0300
commit: 9e20584abe41b77696fe7a170dd31a265bc612bf (patch)
tree: d84a3cdb044855e37e811aa687a5def52faa247b /ci-arch.mdwn
parent: 87febe35380e5554fff690663e02a68ff7cbf571 (diff)
download: wmf-ci-arch-9e20584abe41b77696fe7a170dd31a265bc612bf.tar.gz
1 files changed, 91 insertions, 14 deletions
diff --git a/ci-arch.mdwn b/ci-arch.mdwn
index 8f17967..f8e0479 100644
--- a/ci-arch.mdwn
+++ b/ci-arch.mdwn
@@ -359,25 +359,102 @@ we plan CI to implement them.
 
 ## Log storage
 
-## Artifact storage
-
-* need to store arbitrary blobs for some time
-
-* longer time for anything that gets deployed to production, shorter
-  for everything else?
+* We want to capture the build log or "console output" (stdout,
+  stderr) of the build and store it. This is an invaluable tool for
+  developers to understand what happens in a build, and especially why
+  it failed.
 
-* de-duplicate to save on space?
+* Ideally, the build log is formatted in a way that's easy for humans
+  to read.
 
-* can these be publically accessible? sometimes not?
+* It'd also be nice if the build log can be easily processed
+  programmatically, to extract information from it automatically.
 
-* artifact storage must be secure, as everything that gets deployed to
-  production goes via it
+* We may want to store build logs for extended periods of time so that
+  we can analyze them later. By storing them in a de-duplicating and
+  compressing manner, the way backup software like Borg does, the
+  storage requirements can be kept reasonable.
 
-## Credentials management
-
-* what are the requirements and use cases here?
+## Artifact storage
 
-* deployment to K8s vs to bare metal servers?
+* Artifacts are all the files created during the build process that
+  may be needed for automated testing or deployment to production or
+  any other environment: executable binaries, minimized Javascript,
+  automatically generated documentation from source code (javadoc).
+
+* We basically need to store arbitrary blobs for some time. We need to
+  retrieve the blobs for deployment, and possibly other reasons.
+
+* We may want to store artifacts that get deployed to production for a
+  longer time than other artifacts so that we can keep a history what
+  was in production at any recent-ish point in time.
+
+* We will want to trace back from each artifact which git repository
+  and commit it came from.
+
+* We can de-duplicate artifacts (a la backup programs) to save on
+  space. Even so, we will want to automatically expire artifacts on
+  some flexible schedule to keep storage needs in control.
+
+* We need to decide when we can make these artifacts publically
+  accessible.
+
+* Artifact storage must be secure, as everything that gets deployed to
+  production goes via it.
+
+* There are some artifact storage systems we can use.
+
+## Credentials management and access control
+
+* Credentials and other secrets are used to allow access to servers,
+  services, and files. They are often highly security sensitive data.
+  The CI system needs to protect them, but allow controlled use of
+  them.
+
+* Example: a CI job needs to deploy a Docker image with a tested and
+  reviewed change as a container orchestrated by Kubernetes. For this,
+  it needs to authenticate itself to the Kubernetes API. This is
+  typically done by a username/password combination. How will the
+  future CI system handle this?
+
+* Example: for tests, and in production, a MediaWiki container needs
+  access to a MariaDB database, and MW needs to authenticate itself to
+  the database. MW gets the necessary credentials for this from its
+  configuration, which CI will install during deployment. The
+  configuration will be specific for what the container is being used:
+  if it's for testing a change, the configuration only allows access
+  to a test database, but for production it provides access to the
+  production database.
+
+* FIXME: This is unclear as yet, the text below is some incoherent
+  preliminary rambling by Lars which needs review and fixing.
+
+* Builds are done in isolated containers. These containers have no
+  credentials. Build artifacts are extracted from the containers and
+  stored in an artifact storage system by the CI system, and this is
+  done in a controlled environment, where only vetted code is run, not
+  code from the repository being tested.
+
+* Deployments happen in controlled environments, with access to the
+  credentials needed for deployment. The deployment retrieve artifacts
+  from the artifact storage system. The deployments are to containers,
+  and the deployed continers don't have any credentials, unless CI has
+  been configured to install them, in which case CI installs the
+  credentials for the intended use of the container.
+
+* Tests run against software deployed to containers, and those
+  containers only have access to the backing services needed for the
+  test.
+
+* The CI system needs a way to store the credentials that can only be
+  accessed by CI itself, when it's deploying a container (Kubernetes
+  API access) or configuring the container (installing credentials for
+  the intended use of container).
+
+  This might be, for example, a set of files deployed to the CI host
+  where container deployment or configuration runs, with access
+  control provided by Unix permissions. Not sure if this is
+  sufficiently secure.
 
 ## Interdependent changes to multiple components
author	Lars Wirzenius <lwirzenius@wikimedia.org>	2019-05-02 20:51:22 +0300
committer	Lars Wirzenius <lwirzenius@wikimedia.org>	2019-05-02 20:51:22 +0300
commit	9e20584abe41b77696fe7a170dd31a265bc612bf (patch)
tree	d84a3cdb044855e37e811aa687a5def52faa247b /ci-arch.mdwn
parent	87febe35380e5554fff690663e02a68ff7cbf571 (diff)
download	wmf-ci-arch-9e20584abe41b77696fe7a170dd31a265bc612bf.tar.gz