From 4acbf5d0a2876052984133627514d7edbfa4630a Mon Sep 17 00:00:00 2001 From: Lars Wirzenius Date: Wed, 23 Feb 2022 19:20:14 +0200 Subject: docs: add planning meeting minutes Sponsored-by: author --- blog/2022/02/23/planning.mdwn | 138 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 138 insertions(+) create mode 100644 blog/2022/02/23/planning.mdwn diff --git a/blog/2022/02/23/planning.mdwn b/blog/2022/02/23/planning.mdwn new file mode 100644 index 0000000..bcea819 --- /dev/null +++ b/blog/2022/02/23/planning.mdwn @@ -0,0 +1,138 @@ +[[!meta title="Iteration planning: February 26 – March 12"]] +[[!meta date="Wed, 23 Feb 2022 18:52:11 +0200"]] +[[!tag meeting]] + +[[!toc levels=2]] + +# Assessment of the iteration that has ended + +[previous iteration]: /blog/2022/01/07/planning +[obnam-benchmark-results]: https://gitlab.com/obnam/obnam-benchmark-results +[benchmark results page]: https://doc.obnam.org/obnam-benchmark-results/ + +The goal of the [previous iteration][] was: + +> The goal for this iteration is to define an initial set of benchmarks +> for Obnam, and to run them, and to publish the results on +> doc.obnam.org. All of this should be made as automatic as possible. + +This was completed. The running of benchmarks is manual, and Lars will +do that for every release, going forward. The results will be put into +the [obnam-benchmark-results][] repository, which will trigger CI to +update the [benchmark results page][] with a summary of the results. + +This iteration was meant to also fix the following issues: + +* [[!issue 174]] -- _Doesn't log performance metrics_ + - Lars created [[!mr 214]], but it's not merged yet. Alexander made + good suggestions for tidying up the code, but Lars failed to make + them work, and then got distracted by performance investigations. + They can be worked on later. +* [[!issue 176]] -- _Doesn't report version with much detail_ + - Lars didn't work on this after all. +* [[!issue 180]] -- _Chunk metadata should be in AAD, not in headers?_ + - Lars didn't work on this after all. + + +The iteration ran over a few weeks, mostly due to the northern +hemisphere Darkness affecting Lars's ability to be productive, and +also Lars got distracted by looking at improving Obnam performance. + +# Discussion + +## Current development theme + +The current theme of development for Obnam is performance, because +that is currently Lars's primary worry. The choices are performance, +security, convenience, at least currently. + +## Performance + +Lars has been investigating where Obnam performance bottlenecks are, +by running benchmarks, and looking at profiling results from [cargo +flamegraph][]. For an Obnam run with a good number of files that +haven't changed, most of the time in Obnam goes into inserting rows +into an SQLite database for the new generation. This led Lars to do +some investigation into how fast he can make this happen. + +Lars wrote a little program that creates an SQLite database and the +inserts a million rows into a table modelled after the Obnam `files` +table. The first, naive approach resulted in about 80,000 rows +inserted per second on his laptop, and nearly 120,000 on his +development server. After reading an [article by Jason Wyatt][] Lars +then did the following changes: + +* use a single transaction for all million inserts +* use the `rusqlite` prepared statement cache instead of preparing a + new statement for each insert + +The resulting speeds were (best speed of three runs, compiled in +release mode, on development server with NVMe drives): + +program inserts/s +-------- ----------- +individual-insert 117509 +individual-one-transaction 209512 +individual-prepared.rs 970874 + +That's almost a million inserts per second. That'll do for now. + +Another approach might be to modify a copy of the previous generation, +but the logic gets trickier than with the approach of starting with an +empty database and inserting what we find in live data. + +Lars also looked at what it would take to change the current Obnam +abstractions around SQLite to use the approach used above. He feels +the Obnam abstractions he wrote originally are messy and could do with +a better abstraction. He intends to work on that in the new iteration. + +[cargo flamegraph]: https://crates.io/crates/flamegraph +[article by Jason Wyatt]: https://medium.com/@JasonWyatt/squeezing-performance-from-sqlite-insertions-971aff98eef2 + +# Repository review + +Lars didn't review any issues, merge requests, or CI pipelines this +time. He wants to work on database abstractions first. + +# Goals + +## Goal for 1.0 (not changed this iteration) + +The goal for version 1.0 is for Obnam to be an utterly boring backup +solution for Linux command line users. It should just work, be +performant, secure, and well-documented. + +It is not a goal for version 1.0 to have been ported to other +operating systems, but if there are volunteers to do that, and to +commit to supporting their port, ports will be welcome. + +Other user interfaces is likely to happen only after 1.0. + +The server component will support multiple clients in a way that +doesn’t let them see each other’s data. It is not a goal for clients +to be able to share data, even if the clients trust each other. + +## Goal for the next few iterations (not changed for this iteration) + +The goal for next few iterations is to have Obnam be performant. This +will include, at least, making the client use more concurrency so that +it can use more CPU cores to compute checksums for de-duplication. + +## Goal for this iteration (new for this iteration) + +The goal for this iteration is to tidy up database abstraction code in +the Obnam client and implement the performance improvements Lars +did prototype code for. + +# Commitments for this iteration + +Lars will work on Obnam client database abstractions and performance. +The goal for these is for Obnam to be able to run `obnam backup` on a +live data set of a million files that haven't changed since the +previous backup in less than 30 seconds, on Lars's development server. + +This work is not captured in issues. + +# Meeting participants + +* Lars Wirzenius -- cgit v1.2.1