summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorLars Wirzenius <liw@liw.fi>2017-12-24 11:21:47 +0200
committerLars Wirzenius <liw@liw.fi>2017-12-24 11:21:47 +0200
commita8829f632f89e336a238b06c869111669e82d92b (patch)
tree7228d23222a38ba7693aeda8f3a81f09548cf17a
parent39d6d117c94e31268afc13710a0c9b8bee711b51 (diff)
downloadick.liw.fi-a8829f632f89e336a238b06c869111669e82d92b.tar.gz
Publish log entry
-rw-r--r--blog/2017/12/24/parameterised_pipelines.mdwn174
1 files changed, 174 insertions, 0 deletions
diff --git a/blog/2017/12/24/parameterised_pipelines.mdwn b/blog/2017/12/24/parameterised_pipelines.mdwn
new file mode 100644
index 0000000..ce27c3a
--- /dev/null
+++ b/blog/2017/12/24/parameterised_pipelines.mdwn
@@ -0,0 +1,174 @@
+[[!meta title="Parameterised pipelines"]]
+[[!tag architecture]]
+[[!meta date="2017-12-24 10:15"]]
+
+# The problem
+
+Currently, Ick has a very simple model of projects and pipelines.
+Pipelines are defined independently of projects, and project just
+list, by name, which pipelines they consist of. A pipeline is a
+sequence of actions, where an action is a snippet of shell or Python
+code. The snippets do not get any parameters. I have currently have
+two projects, which both build a static website with ikiwiki. Both
+projects have nearly identical pipelines (expressed here as YAML, but
+equivalent to JSON):
+
+ name: build_static_site
+ actions:
+ - python: |
+ git_url = 'git://git.liw.fi/ick.liw.fi'
+ rsync_target = 'ickliwfi@pieni.net:/srv/http/ick.liw.fi'
+ import os, sys, cliapp
+ def R(*args, **kwargs):
+ kwargs['stdout'] = kwargs['stderr'] = None
+ cliapp.runcmd(*args, **kwargs)
+ R(['git', 'clone', git_url, 'src'])
+ R(['ql-ikiwiki-publish', 'src', rsync_target])
+
+The other pipeline is otherwise identical, but it defines
+differerent `git_url` and `rsync_target` values.
+
+This code duplication is silly and I want to fix this.
+
+# Possible approaches
+
+Code duplication between pipelines can be addressed in various ways.
+Here's a short summary of the ones I have considered.
+
+* Use jinja2 or similar templating for the code snippet. The project
+ would define some variables, which get interpolated into the code
+ snippet in some way when it gets run. Jinja2 is a well-regarded
+ Python templating library that could be used.
+
+ Pro: simple; straightforward; not dependent on the programming
+ language of the snippet.
+
+ Con: not well suited for non-simple values, e.g., lists and structs;
+ snippets need to be written carefully to add appropriate quoting or
+ escaping so that, for example, template interpolation a shell
+ snippet does not in-advertantly introduce syntax or semantic errors
+
+* Use a generator utility to create un-parameterised pipelines. A
+ tool reads a language built on top of un-parameterised pipelines and
+ projects and generates pipelines that embed the expanded parameters.
+
+ Pro: this moves the parameters complexity completely outside the
+ controller and worker-manager.
+
+ Con: requires a separate languages outside core Ick; adds a separate
+ "compliation" phase when managing project and pipeline
+ specifications, which seems like an unnecesary step.
+
+* Add to the controller an understanding of pipeline paramaters, which
+ get provided by projects using the pipelines. Implement a way to
+ pass in parameter values for each type of snippet (Python and shell,
+ at the moment).
+
+ Pro: makes Ick's project/pipeline specifications more powerful.
+
+ Con: more complexit in Ick, though it's not too bad; requires more
+ effort to add a new languges for pipeline action snippets.
+
+Overall, I find the approach of dealing with parameters natively in
+project and pipeline specifications the best one. So I choose that. If
+it turns out to be a problem, it's a decision that can be re-visited
+later.
+
+# Specification
+
+## Project parameters
+
+I will add a way for a project to specify parameters. These apply to
+all pipelines used by the project. Parameters will be defined as a
+dict:
+
+ project: ick.liw.fi
+ parameters:
+ git_url: git://git.liw.fi/ick.liw.fi
+ rsync_target: ickliwfi@pieni.net:/srv/http/ick.liw.fi
+ pipelines:
+ - build_static_site
+
+A parameters value can any thing that JSON allows:
+
+ project: hello
+ parameters:
+ gits:
+ - url: git://git.liw.fi/hello
+ branch: master
+ subdir: src
+ - url: git://git.liw.fi/hello-debian
+ branch: debian/unstable
+ subdir: src/debian
+ pipelines:
+ - build_debian_package
+
+In the above example, the Debian packacing part of the source tree
+comes from its own git repository that gets cloned into a
+sub-directory of the upstream part.
+
+## Pipeline parameters
+
+I will add a way for pipelines to declare the parameters they want, by
+listing them by name.
+
+ name: build_static_site
+ parameters:
+ - git_url
+ - rsync_target
+ actions:
+ - python: |
+ git_url = params['git_url']
+ rsync_target = params['rsync_target']
+ import os, sys, cliapp
+ def R(*args, **kwargs):
+ kwargs['stdout'] = kwargs['stderr'] = None
+ cliapp.runcmd(*args, **kwargs)
+ R(['git', 'clone', git_url, 'src'])
+ R(['ql-ikiwiki-publish', 'src', rsync_target])
+
+When the controller give an action for the worker-manager to execute,
+the `work` resource will have the parameters:
+
+ {
+ "parameters": {
+ "git_url": "git://git.liw.fi/ick.liw.fi",
+ "rsync_target": "ickliwfi@pieni.net:/srv/http/ick.liw.fi"
+ },
+ ...
+ }
+
+The actual step will access the parameters in a suitable way.
+
+* If the step is implemented by the worker-manager directly, it can
+ directly access the parameters directly.
+* If the step is implemented by a Python snippet, worker-manager will
+ prepend a bit of code to the beginnig of the snippet to set a global
+ Python dict variable, `params`, which can be used by the snippet.
+* If the step is implemented by a shell snippet, worker-manager will
+ prepend a bit of code to the beginning of the snippet to define a
+ shell function, `params`, that outputs a JSON object that defines
+ the parameters. The snippet can pipe that to the `jq` utility, which
+ can extract the desired value. `jq` is a small, but powerful utility
+ (installation size about 100 KiB) for processing JSON
+ programmatically from shell. It will need to be installed on the
+ workers.
+
+# jq examples
+
+To get a simple value:
+
+ params | jq -r .foo
+
+To get a simple value from inside a more complicated on:
+
+ params | jq -r '.gits|.[1]|.url'
+
+# Considerations
+
+There will be no type safety, at least for now. If the pipeline
+expects a list and gets a plain string, tough luck.
+
+Requiring `jq` on workers is a compromise, for now. It avoids having
+to implement the same functionality in another way. It's small enough
+to hopefully not be a size problem on workers.