From 0b0507ae765b5b4b772df54d8b133fb7c3427076 Mon Sep 17 00:00:00 2001 From: Lars Wirzenius Date: Fri, 7 Jan 2022 12:48:42 +0200 Subject: planning meeting for new iteration Sponsored-by: author --- blog/2022/01/07/planning.mdwn | 299 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 299 insertions(+) create mode 100644 blog/2022/01/07/planning.mdwn diff --git a/blog/2022/01/07/planning.mdwn b/blog/2022/01/07/planning.mdwn new file mode 100644 index 0000000..9ffe0d0 --- /dev/null +++ b/blog/2022/01/07/planning.mdwn @@ -0,0 +1,299 @@ +[[!meta title="Iteration planning: Januuary 9–23"]] +[[!meta date="Fri, 07 Jan 2022 09:16:56 +0200"]] +[[!tag meeting]] + +[[!toc levels=2]] + +# Assessment of the iteration that has ended + +[previous iteration]: /blog/2021/12/06/planning + +The goal of the [previous iteration][] was: + +> The goal for this iteration is to write a dedicated program for +> running benchmarks for Obnam. + +This was completed: Lars wrote a rudimentary program, +[`obnam-benchmark`](https://gitlab.com/obnam/obnam-benchmark/), see +below. We also made [release 0.7.0](https://obnam.org/blog/2022/01/04/obnam-0.7.0/). + +The iteration ran over a few weeks, mostly due to end-of-year +holidays, and the northern hemisphere Darkness affecting Lars's +ability to be productive. + +# Discussion + +## Current development theme + +The current theme of development for Obnam is performance, because +that is currently Lars's primary worry. The choices are performance, +security, convenience. + +## Policy for `cargo deny` + +We still need more discussion about how to tighten up the policy for +`cargo deny`, in [[!issue 157]]. Lars will make an executive decision +on his own otherwise during this iteration. + +## Lars wants to use Obnam for real + +Lars is test driving Obnam on a small subset of his data: his local +email archive. This is roughly a million files, but doesn't change +much from day to day. The backup takes several minutes to run. He'd +like to start using Obnam as his primary backup system, and has +identified two things that will be dealt with first: using the server +API needs to be authenticated, and the overall speed needs to +massively improved. + +Lars opened [[!issue 186]] for the authentication. The performance +aspect is an ongoing development theme in any case. + +Lars arbitrarily defines a personal performance requirement as +follows: given a data set of on three million files that hasn't +changed since the previous backup, an incremental backup should take +at most 60 seconds on his laptop, with the server running on the same +gigabit LAN. + +## Plans for Debian bookworm + +We are still collecting thoughts about having Obnam in Debian 12 in +[[!issue 162]]. What are we willing to commit to support for +the expected three years of Debian's security support for that +release? Without adding features or major new upstream versions. + +## The `obnam-benchmark` tool + +Lars has written a rudimentary tool for doing benchmarks of Obnam. +It's called `obnam-benchmark`, with [source on +gitlab.com](https://gitlab.com/obnam/obnam-benchmark/) and ugly Debian +packages in Lars's APT repository. (Everyone else probably wants to +install by building from source, from now.) + +The `obnam-benchmark` tool reads a YAML file that describes a suite of +benchmarks to run. Each benchmark defines one or more backups to make, +and the test data to generate for each backup. When the benchmarks are +run, the tool generates the data, starts the server, makes each +backup, and restores it. + +For each backup and restore it measures the duration, and other +things. The measurements are written out as a JSON file. The tool can +generate a report from a set of such JSON files, as a Markdown file, +with tables of results. Other document formats can be generated from +Markdown. (The report is possibly the weakest part of the tool. Lars +is not great at that stuff. Please help, if only by opening issues to +explain how to make the report more useful to you.) + +The tool can use an installed version of Obnam, or it can build Obnam +from any git commit. The version of Obnam is included in the report. +The intent is that we'll run the tool for every Obnam release, and +optionally every commit since the previous release, to see how Obnam +performance changes over time. + +The tool is ready to be used, at least for simple benchmarks, but +needs improvements to be good. For example, it's quite slow at +creating test data, and can't use pre-created test data. Help +improving the tool would be most welcome. + +Now that we have the tool, we need to start using it. This requires, +at minimum: + +* a set of standard benchmarks for Obnam that represents real world + usage patterns, as well as any artificial ones that make sense for + developing Obnam +* one or more standard environments in which to run the benchmarks +* a place to publish results and reports + +Lars proposes the following: + +* Lars will set up a VM on his development hardware for running + benchmarks. This will have 4 virtual CPUs, and 4 GiB of RAM, and 100 + GB of storage. The CPU count and memory size are intentionally + limited to make sure Obnam performs well when not running on a + supercomputer. + - It'd be nice if others ran Obnam benchmarks on their own systems + and contributed the results. +* We'll create a git repository for storing the benchmark + specifications, tentatively to be called `obnam-benchmark-specs`. It + might be a single file for now. Lars feels this is best kept + separate from other repositories so it's obvious in the extreme if + it changes +* We'll create another repository, `obnam-benchmark-results`, where + the JSON result files get stored. Each run of `obnam-benchmmark` + will add a new result file. Old files may get removed, once they're + no longer useful. An example of that might be results for commits + between releases, or for quite old releases. + - Lars plans to run `obnam-benchmark` for himself, manually, and + upload the results to the repository via pull requests. + - Others can also submit pull requests to add their reports, as + usual with gitlab. +* Any changes to the results repository will trigger a CI job to + generate the report, and publish it on + [doc.obnam.org](https://doc.obnam.org/). This CI job will run on + Lars's personal CI system, which has upload access to the server + running that site. + - This assumes the generated reports are not useful to store, and + that only the latest one is important. If the result data files + are stored, in principle any report can be re-generated, and it + doesn't seem worth keeping old versions of the report. Thoughts?q + + +## Splitting off the server part of the `obnam` crate? + +The discussion of whether to split the `obnam` crate into a client and +a server crate continues in [[!issue 175]]. Further opinions would be +welcome, though Lars is leaning towards splitting. No action is +planned at the moment. + + +# Repository review + +Lars reviewed all the open issues, merge requests, and CI pipelines +for all the projects in the Obnam group on gitlab.com. + +## Container Images + +This is . There were no +open issues, no extra branches, and no merge requests. CI pipelines +have been passing, and Lars ran the pipeline to freshen up he +container image. + +## obnam.org + +This is . There is one issue, +regarding a need for benchmark results, which Lars closed as no longer +being relevant to this repository. There were no extra branches, and +no open merge requests. There is no CI for this repository. + +## obnam-benchmark + +This is , and is the new +repository for the new tool. There were 11 open issues, about missing +features, and performance. The one about making a release and +uploading the tool to crates.io seems worth doing in this iteration. +Any other improvements that block actually running benchmarks and +automatically generating and publishing reports also need to be done, +if any are found. + +## obnam + +This is . There were 62 open issues. +Lars reviewed all of them, made comments and other updates as needed, +and closed: + +* [[!issue 16]] _Doesn't restore the access time_ + - access times change when backups are done, and are generally not + very useful +* [[!issue 64]] _Use a CAM_ + - Lars doesn't want to use content based addressing on the server. +* [[!issue 69]] _On Collision Resistance and Content Addressable Storage_ + - Lars doesn't want to use content based addressing on the server. +* [[!issue 85]] _Use case: anonymous user T_ + - all done +* [[!issue 92]] _Lacks a way to verify a backup can still be restored_ + - duplicate of [[!issue 50]], and the number of open issues is + starting to be large enough that duplicates are better closed +* [[!issue 99]] _Should maybe use the ring crate for AEAD_ + - closed as unnecessary +* [[!issue 163]] _Client could do with a built-in dummy server mode + for benchmarks_ + - closed as unnecessary + +There were 54 open issues after this. + +# Goals + +## Goal for 1.0 (not changed this iteration) + +The goal for version 1.0 is for Obnam to be an utterly boring backup +solution for Linux command line users. It should just work, be +performant, secure, and well-documented. + +It is not a goal for version 1.0 to have been ported to other +operating systems, but if there are volunteers to do that, and to +commit to supporting their port, ports will be welcome. + +Other user interfaces is likely to happen only after 1.0. + +The server component will support multiple clients in a way that +doesn’t let them see each other’s data. It is not a goal for clients +to be able to share data, even if the clients trust each other. + +## Goal for the next few iterations (not changed for this iteration) + +The goal for next few iterations is to have Obnam be performant. This +will include, at least, making the client use more concurrency so that +it can use more CPU cores to compute checksums for de-duplication. + +## Goal for this iteration (new for this iteration) + +The goal for this iteration is to define an initial set of benchmarks +for Obnam, and to run them, and to publish the results on +doc.obnam.org. All of this should be made as automatic as possible. + +# Commitments for this iteration + +We collect issues for this iteration in [[!milestone 12]]. +Lars intends to work on: + +- [obnam-benchmark issue + #16](https://gitlab.com/obnam/obnam-benchmark/-/issues/16) + _Is not on crates.io_ + - this _should_ be quick, but involves making a release, and that + tends to always go wrong the first few times, or when it's not + been done for a while + - 1h +- [[!issue 157]] _"cargo deny" policy is not strict_ + - make policy stricter to deny yanked versions, and security + vulnerabilities + - make sure the test suite still passes, and fix any issues + - 1h (optimistic, assuming nothing goes wrong) +- [[!issue 166]] _Lacks comprehensive benchmark suite_ + - carried over from the previous iteration: need to define and run + the benchmarks, and publish results, and automate as much of that + as possible + - 4h +- [[!issue 170]] _Should record MSRV in Cargo.toml `rust-version`_ + - 0.25h +- [[!issue 173]] _Should have as a requirement that it doesn't cache + very much locally_ + - 0.25h +- [[!issue 174]] _Doesn't log performance metrics_ + - add at least some collection of and performance metrics, even if + not all the ones in the issue are added yet + - 1h +- [[!issue 176]] _Doesn't report version with much detail_ + - 1h +- [[!issue 177]] _What does BackupReason::Skipped actually mean?_ + - research, then document in the code + - 1h +- [[!issue 178]] _Is src/benchmark.rs useful to export?_ + - delete it if it's not used, or comment on issue if it is? + - 0.25h +- [[!issue 179]] _`Chunker` is a silly name for an iterator._ + - rename + - 0.25h +- [[!issue 180]] _Chunk metadata should be in AAD, not in headers?_ + - add copy of metadata to AAD, but keep the headers for now + - not performance related, but seems worth doing earlier rather than + later + - 1h +- [[!issue 181]] _The name AsyncBackupClient implies a non-async + version_ + - rename + - 0.25h +- [[!issue 182]] _Does it make sense to keep AsyncBackupClient and + AsyncChunkClient separate?_ + - make change, and merge if it seems worthwhile + - if not, comment on issue and close it, and also in code + - 0.25h +- [[!issue 184]] _README has unnecessary YAML metadata_ + - drop it + - 0.25h + +That's about 14 hours of estimated work. Hopefully not too much. These +are not all performance related, but it's important to also tidy up as +development goes forward. + +# Meeting participants + +* Lars Wirzenius -- cgit v1.2.1 From 50031bc5ac120add0d269ff6e790a8244ee3754f Mon Sep 17 00:00:00 2001 From: Lars Wirzenius Date: Mon, 10 Jan 2022 15:00:46 +0200 Subject: fix: typo, found by Alexander Sponsored-by: author --- blog/2022/01/07/planning.mdwn | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/blog/2022/01/07/planning.mdwn b/blog/2022/01/07/planning.mdwn index 9ffe0d0..98a8bb0 100644 --- a/blog/2022/01/07/planning.mdwn +++ b/blog/2022/01/07/planning.mdwn @@ -67,7 +67,7 @@ Lars has written a rudimentary tool for doing benchmarks of Obnam. It's called `obnam-benchmark`, with [source on gitlab.com](https://gitlab.com/obnam/obnam-benchmark/) and ugly Debian packages in Lars's APT repository. (Everyone else probably wants to -install by building from source, from now.) +install by building from source, for now.) The `obnam-benchmark` tool reads a YAML file that describes a suite of benchmarks to run. Each benchmark defines one or more backups to make, -- cgit v1.2.1 From 6cf65dc142e9ff30ca0d8d834782c6a5bbd63262 Mon Sep 17 00:00:00 2001 From: Lars Wirzenius Date: Mon, 10 Jan 2022 15:03:13 +0200 Subject: fix: drop unclear justification Sponsored-by: author --- blog/2022/01/07/planning.mdwn | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/blog/2022/01/07/planning.mdwn b/blog/2022/01/07/planning.mdwn index 98a8bb0..f5fd327 100644 --- a/blog/2022/01/07/planning.mdwn +++ b/blog/2022/01/07/planning.mdwn @@ -114,9 +114,7 @@ Lars proposes the following: and contributed the results. * We'll create a git repository for storing the benchmark specifications, tentatively to be called `obnam-benchmark-specs`. It - might be a single file for now. Lars feels this is best kept - separate from other repositories so it's obvious in the extreme if - it changes + might be a single file for now. * We'll create another repository, `obnam-benchmark-results`, where the JSON result files get stored. Each run of `obnam-benchmmark` will add a new result file. Old files may get removed, once they're -- cgit v1.2.1