summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorLars Wirzenius <liw@liw.fi>2022-01-10 13:03:50 +0000
committerLars Wirzenius <liw@liw.fi>2022-01-10 13:03:50 +0000
commit9c5ce6497908c82afa093225d9e47c2c64e4b5b8 (patch)
treeaae535d7f76016a70882bba3198dd59a80668ad9
parent9faa22158b08547fdb65dbe65e7dad04f81b95f7 (diff)
parent6cf65dc142e9ff30ca0d8d834782c6a5bbd63262 (diff)
downloadobnam.org-9c5ce6497908c82afa093225d9e47c2c64e4b5b8.tar.gz
Merge branch 'meeting' into 'main'
planning meeting for new iteration See merge request obnam/obnam.org!66
-rw-r--r--blog/2022/01/07/planning.mdwn297
1 files changed, 297 insertions, 0 deletions
diff --git a/blog/2022/01/07/planning.mdwn b/blog/2022/01/07/planning.mdwn
new file mode 100644
index 0000000..f5fd327
--- /dev/null
+++ b/blog/2022/01/07/planning.mdwn
@@ -0,0 +1,297 @@
+[[!meta title="Iteration planning: Januuary 9&ndash;23"]]
+[[!meta date="Fri, 07 Jan 2022 09:16:56 +0200"]]
+[[!tag meeting]]
+
+[[!toc levels=2]]
+
+# Assessment of the iteration that has ended
+
+[previous iteration]: /blog/2021/12/06/planning
+
+The goal of the [previous iteration][] was:
+
+> The goal for this iteration is to write a dedicated program for
+> running benchmarks for Obnam.
+
+This was completed: Lars wrote a rudimentary program,
+[`obnam-benchmark`](https://gitlab.com/obnam/obnam-benchmark/), see
+below. We also made [release 0.7.0](https://obnam.org/blog/2022/01/04/obnam-0.7.0/).
+
+The iteration ran over a few weeks, mostly due to end-of-year
+holidays, and the northern hemisphere Darkness affecting Lars's
+ability to be productive.
+
+# Discussion
+
+## Current development theme
+
+The current theme of development for Obnam is performance, because
+that is currently Lars's primary worry. The choices are performance,
+security, convenience.
+
+## Policy for `cargo deny`
+
+We still need more discussion about how to tighten up the policy for
+`cargo deny`, in [[!issue 157]]. Lars will make an executive decision
+on his own otherwise during this iteration.
+
+## Lars wants to use Obnam for real
+
+Lars is test driving Obnam on a small subset of his data: his local
+email archive. This is roughly a million files, but doesn't change
+much from day to day. The backup takes several minutes to run. He'd
+like to start using Obnam as his primary backup system, and has
+identified two things that will be dealt with first: using the server
+API needs to be authenticated, and the overall speed needs to
+massively improved.
+
+Lars opened [[!issue 186]] for the authentication. The performance
+aspect is an ongoing development theme in any case.
+
+Lars arbitrarily defines a personal performance requirement as
+follows: given a data set of on three million files that hasn't
+changed since the previous backup, an incremental backup should take
+at most 60 seconds on his laptop, with the server running on the same
+gigabit LAN.
+
+## Plans for Debian bookworm
+
+We are still collecting thoughts about having Obnam in Debian 12 in
+[[!issue 162]]. What are we willing to commit to support for
+the expected three years of Debian's security support for that
+release? Without adding features or major new upstream versions.
+
+## The `obnam-benchmark` tool
+
+Lars has written a rudimentary tool for doing benchmarks of Obnam.
+It's called `obnam-benchmark`, with [source on
+gitlab.com](https://gitlab.com/obnam/obnam-benchmark/) and ugly Debian
+packages in Lars's APT repository. (Everyone else probably wants to
+install by building from source, for now.)
+
+The `obnam-benchmark` tool reads a YAML file that describes a suite of
+benchmarks to run. Each benchmark defines one or more backups to make,
+and the test data to generate for each backup. When the benchmarks are
+run, the tool generates the data, starts the server, makes each
+backup, and restores it.
+
+For each backup and restore it measures the duration, and other
+things. The measurements are written out as a JSON file. The tool can
+generate a report from a set of such JSON files, as a Markdown file,
+with tables of results. Other document formats can be generated from
+Markdown. (The report is possibly the weakest part of the tool. Lars
+is not great at that stuff. Please help, if only by opening issues to
+explain how to make the report more useful to you.)
+
+The tool can use an installed version of Obnam, or it can build Obnam
+from any git commit. The version of Obnam is included in the report.
+The intent is that we'll run the tool for every Obnam release, and
+optionally every commit since the previous release, to see how Obnam
+performance changes over time.
+
+The tool is ready to be used, at least for simple benchmarks, but
+needs improvements to be good. For example, it's quite slow at
+creating test data, and can't use pre-created test data. Help
+improving the tool would be most welcome.
+
+Now that we have the tool, we need to start using it. This requires,
+at minimum:
+
+* a set of standard benchmarks for Obnam that represents real world
+ usage patterns, as well as any artificial ones that make sense for
+ developing Obnam
+* one or more standard environments in which to run the benchmarks
+* a place to publish results and reports
+
+Lars proposes the following:
+
+* Lars will set up a VM on his development hardware for running
+ benchmarks. This will have 4 virtual CPUs, and 4 GiB of RAM, and 100
+ GB of storage. The CPU count and memory size are intentionally
+ limited to make sure Obnam performs well when not running on a
+ supercomputer.
+ - It'd be nice if others ran Obnam benchmarks on their own systems
+ and contributed the results.
+* We'll create a git repository for storing the benchmark
+ specifications, tentatively to be called `obnam-benchmark-specs`. It
+ might be a single file for now.
+* We'll create another repository, `obnam-benchmark-results`, where
+ the JSON result files get stored. Each run of `obnam-benchmmark`
+ will add a new result file. Old files may get removed, once they're
+ no longer useful. An example of that might be results for commits
+ between releases, or for quite old releases.
+ - Lars plans to run `obnam-benchmark` for himself, manually, and
+ upload the results to the repository via pull requests.
+ - Others can also submit pull requests to add their reports, as
+ usual with gitlab.
+* Any changes to the results repository will trigger a CI job to
+ generate the report, and publish it on
+ [doc.obnam.org](https://doc.obnam.org/). This CI job will run on
+ Lars's personal CI system, which has upload access to the server
+ running that site.
+ - This assumes the generated reports are not useful to store, and
+ that only the latest one is important. If the result data files
+ are stored, in principle any report can be re-generated, and it
+ doesn't seem worth keeping old versions of the report. Thoughts?q
+
+
+## Splitting off the server part of the `obnam` crate?
+
+The discussion of whether to split the `obnam` crate into a client and
+a server crate continues in [[!issue 175]]. Further opinions would be
+welcome, though Lars is leaning towards splitting. No action is
+planned at the moment.
+
+
+# Repository review
+
+Lars reviewed all the open issues, merge requests, and CI pipelines
+for all the projects in the Obnam group on gitlab.com.
+
+## Container Images
+
+This is <https://gitlab.com/obnam/container-images>. There were no
+open issues, no extra branches, and no merge requests. CI pipelines
+have been passing, and Lars ran the pipeline to freshen up he
+container image.
+
+## obnam.org
+
+This is <https://gitlab.com/obnam/obnam.org>. There is one issue,
+regarding a need for benchmark results, which Lars closed as no longer
+being relevant to this repository. There were no extra branches, and
+no open merge requests. There is no CI for this repository.
+
+## obnam-benchmark
+
+This is <https://gitlab.com/obnam/obnam-benchmark>, and is the new
+repository for the new tool. There were 11 open issues, about missing
+features, and performance. The one about making a release and
+uploading the tool to crates.io seems worth doing in this iteration.
+Any other improvements that block actually running benchmarks and
+automatically generating and publishing reports also need to be done,
+if any are found.
+
+## obnam
+
+This is <https://gitlab.com/obnam/obnam>. There were 62 open issues.
+Lars reviewed all of them, made comments and other updates as needed,
+and closed:
+
+* [[!issue 16]] _Doesn't restore the access time_
+ - access times change when backups are done, and are generally not
+ very useful
+* [[!issue 64]] _Use a CAM_
+ - Lars doesn't want to use content based addressing on the server.
+* [[!issue 69]] _On Collision Resistance and Content Addressable Storage_
+ - Lars doesn't want to use content based addressing on the server.
+* [[!issue 85]] _Use case: anonymous user T_
+ - all done
+* [[!issue 92]] _Lacks a way to verify a backup can still be restored_
+ - duplicate of [[!issue 50]], and the number of open issues is
+ starting to be large enough that duplicates are better closed
+* [[!issue 99]] _Should maybe use the ring crate for AEAD_
+ - closed as unnecessary
+* [[!issue 163]] _Client could do with a built-in dummy server mode
+ for benchmarks_
+ - closed as unnecessary
+
+There were 54 open issues after this.
+
+# Goals
+
+## Goal for 1.0 (not changed this iteration)
+
+The goal for version 1.0 is for Obnam to be an utterly boring backup
+solution for Linux command line users. It should just work, be
+performant, secure, and well-documented.
+
+It is not a goal for version 1.0 to have been ported to other
+operating systems, but if there are volunteers to do that, and to
+commit to supporting their port, ports will be welcome.
+
+Other user interfaces is likely to happen only after 1.0.
+
+The server component will support multiple clients in a way that
+doesn’t let them see each other’s data. It is not a goal for clients
+to be able to share data, even if the clients trust each other.
+
+## Goal for the next few iterations (not changed for this iteration)
+
+The goal for next few iterations is to have Obnam be performant. This
+will include, at least, making the client use more concurrency so that
+it can use more CPU cores to compute checksums for de-duplication.
+
+## Goal for this iteration (new for this iteration)
+
+The goal for this iteration is to define an initial set of benchmarks
+for Obnam, and to run them, and to publish the results on
+doc.obnam.org. All of this should be made as automatic as possible.
+
+# Commitments for this iteration
+
+We collect issues for this iteration in [[!milestone 12]].
+Lars intends to work on:
+
+- [obnam-benchmark issue
+ #16](https://gitlab.com/obnam/obnam-benchmark/-/issues/16)
+ _Is not on crates.io_
+ - this _should_ be quick, but involves making a release, and that
+ tends to always go wrong the first few times, or when it's not
+ been done for a while
+ - 1h
+- [[!issue 157]] _"cargo deny" policy is not strict_
+ - make policy stricter to deny yanked versions, and security
+ vulnerabilities
+ - make sure the test suite still passes, and fix any issues
+ - 1h (optimistic, assuming nothing goes wrong)
+- [[!issue 166]] _Lacks comprehensive benchmark suite_
+ - carried over from the previous iteration: need to define and run
+ the benchmarks, and publish results, and automate as much of that
+ as possible
+ - 4h
+- [[!issue 170]] _Should record MSRV in Cargo.toml `rust-version`_
+ - 0.25h
+- [[!issue 173]] _Should have as a requirement that it doesn't cache
+ very much locally_
+ - 0.25h
+- [[!issue 174]] _Doesn't log performance metrics_
+ - add at least some collection of and performance metrics, even if
+ not all the ones in the issue are added yet
+ - 1h
+- [[!issue 176]] _Doesn't report version with much detail_
+ - 1h
+- [[!issue 177]] _What does BackupReason::Skipped actually mean?_
+ - research, then document in the code
+ - 1h
+- [[!issue 178]] _Is src/benchmark.rs useful to export?_
+ - delete it if it's not used, or comment on issue if it is?
+ - 0.25h
+- [[!issue 179]] _`Chunker` is a silly name for an iterator._
+ - rename
+ - 0.25h
+- [[!issue 180]] _Chunk metadata should be in AAD, not in headers?_
+ - add copy of metadata to AAD, but keep the headers for now
+ - not performance related, but seems worth doing earlier rather than
+ later
+ - 1h
+- [[!issue 181]] _The name AsyncBackupClient implies a non-async
+ version_
+ - rename
+ - 0.25h
+- [[!issue 182]] _Does it make sense to keep AsyncBackupClient and
+ AsyncChunkClient separate?_
+ - make change, and merge if it seems worthwhile
+ - if not, comment on issue and close it, and also in code
+ - 0.25h
+- [[!issue 184]] _README has unnecessary YAML metadata_
+ - drop it
+ - 0.25h
+
+That's about 14 hours of estimated work. Hopefully not too much. These
+are not all performance related, but it's important to also tidy up as
+development goes forward.
+
+# Meeting participants
+
+* Lars Wirzenius