diff options
author | Lars Wirzenius <liw@liw.fi> | 2017-12-24 11:21:47 +0200 |
---|---|---|
committer | Lars Wirzenius <liw@liw.fi> | 2017-12-24 11:21:47 +0200 |
commit | a8829f632f89e336a238b06c869111669e82d92b (patch) | |
tree | 7228d23222a38ba7693aeda8f3a81f09548cf17a | |
parent | 39d6d117c94e31268afc13710a0c9b8bee711b51 (diff) | |
download | ick.liw.fi-a8829f632f89e336a238b06c869111669e82d92b.tar.gz |
Publish log entry
-rw-r--r-- | blog/2017/12/24/parameterised_pipelines.mdwn | 174 |
1 files changed, 174 insertions, 0 deletions
diff --git a/blog/2017/12/24/parameterised_pipelines.mdwn b/blog/2017/12/24/parameterised_pipelines.mdwn new file mode 100644 index 0000000..ce27c3a --- /dev/null +++ b/blog/2017/12/24/parameterised_pipelines.mdwn @@ -0,0 +1,174 @@ +[[!meta title="Parameterised pipelines"]] +[[!tag architecture]] +[[!meta date="2017-12-24 10:15"]] + +# The problem + +Currently, Ick has a very simple model of projects and pipelines. +Pipelines are defined independently of projects, and project just +list, by name, which pipelines they consist of. A pipeline is a +sequence of actions, where an action is a snippet of shell or Python +code. The snippets do not get any parameters. I have currently have +two projects, which both build a static website with ikiwiki. Both +projects have nearly identical pipelines (expressed here as YAML, but +equivalent to JSON): + + name: build_static_site + actions: + - python: | + git_url = 'git://git.liw.fi/ick.liw.fi' + rsync_target = 'ickliwfi@pieni.net:/srv/http/ick.liw.fi' + import os, sys, cliapp + def R(*args, **kwargs): + kwargs['stdout'] = kwargs['stderr'] = None + cliapp.runcmd(*args, **kwargs) + R(['git', 'clone', git_url, 'src']) + R(['ql-ikiwiki-publish', 'src', rsync_target]) + +The other pipeline is otherwise identical, but it defines +differerent `git_url` and `rsync_target` values. + +This code duplication is silly and I want to fix this. + +# Possible approaches + +Code duplication between pipelines can be addressed in various ways. +Here's a short summary of the ones I have considered. + +* Use jinja2 or similar templating for the code snippet. The project + would define some variables, which get interpolated into the code + snippet in some way when it gets run. Jinja2 is a well-regarded + Python templating library that could be used. + + Pro: simple; straightforward; not dependent on the programming + language of the snippet. + + Con: not well suited for non-simple values, e.g., lists and structs; + snippets need to be written carefully to add appropriate quoting or + escaping so that, for example, template interpolation a shell + snippet does not in-advertantly introduce syntax or semantic errors + +* Use a generator utility to create un-parameterised pipelines. A + tool reads a language built on top of un-parameterised pipelines and + projects and generates pipelines that embed the expanded parameters. + + Pro: this moves the parameters complexity completely outside the + controller and worker-manager. + + Con: requires a separate languages outside core Ick; adds a separate + "compliation" phase when managing project and pipeline + specifications, which seems like an unnecesary step. + +* Add to the controller an understanding of pipeline paramaters, which + get provided by projects using the pipelines. Implement a way to + pass in parameter values for each type of snippet (Python and shell, + at the moment). + + Pro: makes Ick's project/pipeline specifications more powerful. + + Con: more complexit in Ick, though it's not too bad; requires more + effort to add a new languges for pipeline action snippets. + +Overall, I find the approach of dealing with parameters natively in +project and pipeline specifications the best one. So I choose that. If +it turns out to be a problem, it's a decision that can be re-visited +later. + +# Specification + +## Project parameters + +I will add a way for a project to specify parameters. These apply to +all pipelines used by the project. Parameters will be defined as a +dict: + + project: ick.liw.fi + parameters: + git_url: git://git.liw.fi/ick.liw.fi + rsync_target: ickliwfi@pieni.net:/srv/http/ick.liw.fi + pipelines: + - build_static_site + +A parameters value can any thing that JSON allows: + + project: hello + parameters: + gits: + - url: git://git.liw.fi/hello + branch: master + subdir: src + - url: git://git.liw.fi/hello-debian + branch: debian/unstable + subdir: src/debian + pipelines: + - build_debian_package + +In the above example, the Debian packacing part of the source tree +comes from its own git repository that gets cloned into a +sub-directory of the upstream part. + +## Pipeline parameters + +I will add a way for pipelines to declare the parameters they want, by +listing them by name. + + name: build_static_site + parameters: + - git_url + - rsync_target + actions: + - python: | + git_url = params['git_url'] + rsync_target = params['rsync_target'] + import os, sys, cliapp + def R(*args, **kwargs): + kwargs['stdout'] = kwargs['stderr'] = None + cliapp.runcmd(*args, **kwargs) + R(['git', 'clone', git_url, 'src']) + R(['ql-ikiwiki-publish', 'src', rsync_target]) + +When the controller give an action for the worker-manager to execute, +the `work` resource will have the parameters: + + { + "parameters": { + "git_url": "git://git.liw.fi/ick.liw.fi", + "rsync_target": "ickliwfi@pieni.net:/srv/http/ick.liw.fi" + }, + ... + } + +The actual step will access the parameters in a suitable way. + +* If the step is implemented by the worker-manager directly, it can + directly access the parameters directly. +* If the step is implemented by a Python snippet, worker-manager will + prepend a bit of code to the beginnig of the snippet to set a global + Python dict variable, `params`, which can be used by the snippet. +* If the step is implemented by a shell snippet, worker-manager will + prepend a bit of code to the beginning of the snippet to define a + shell function, `params`, that outputs a JSON object that defines + the parameters. The snippet can pipe that to the `jq` utility, which + can extract the desired value. `jq` is a small, but powerful utility + (installation size about 100 KiB) for processing JSON + programmatically from shell. It will need to be installed on the + workers. + +# jq examples + +To get a simple value: + + params | jq -r .foo + +To get a simple value from inside a more complicated on: + + params | jq -r '.gits|.[1]|.url' + +# Considerations + +There will be no type safety, at least for now. If the pipeline +expects a list and gets a plain string, tough luck. + +Requiring `jq` on workers is a compromise, for now. It avoids having +to implement the same functionality in another way. It's small enough +to hopefully not be a size problem on workers. |