summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorLars Wirzenius <liw@liw.fi>2022-02-25 09:04:30 +0000
committerLars Wirzenius <liw@liw.fi>2022-02-25 09:04:30 +0000
commit259b78ffe2337053bffcc60c7792b3743fed34a9 (patch)
tree20ef752f340fd1435acf1b4b0f8eef50cf8cf1c9
parent9c5ce6497908c82afa093225d9e47c2c64e4b5b8 (diff)
parent4acbf5d0a2876052984133627514d7edbfa4630a (diff)
downloadobnam.org-259b78ffe2337053bffcc60c7792b3743fed34a9.tar.gz
Merge branch 'planning' into 'main'
docs: add planning meeting minutes See merge request obnam/obnam.org!67
-rw-r--r--blog/2022/02/23/planning.mdwn138
1 files changed, 138 insertions, 0 deletions
diff --git a/blog/2022/02/23/planning.mdwn b/blog/2022/02/23/planning.mdwn
new file mode 100644
index 0000000..bcea819
--- /dev/null
+++ b/blog/2022/02/23/planning.mdwn
@@ -0,0 +1,138 @@
+[[!meta title="Iteration planning: February 26 &ndash; March 12"]]
+[[!meta date="Wed, 23 Feb 2022 18:52:11 +0200"]]
+[[!tag meeting]]
+
+[[!toc levels=2]]
+
+# Assessment of the iteration that has ended
+
+[previous iteration]: /blog/2022/01/07/planning
+[obnam-benchmark-results]: https://gitlab.com/obnam/obnam-benchmark-results
+[benchmark results page]: https://doc.obnam.org/obnam-benchmark-results/
+
+The goal of the [previous iteration][] was:
+
+> The goal for this iteration is to define an initial set of benchmarks
+> for Obnam, and to run them, and to publish the results on
+> doc.obnam.org. All of this should be made as automatic as possible.
+
+This was completed. The running of benchmarks is manual, and Lars will
+do that for every release, going forward. The results will be put into
+the [obnam-benchmark-results][] repository, which will trigger CI to
+update the [benchmark results page][] with a summary of the results.
+
+This iteration was meant to also fix the following issues:
+
+* [[!issue 174]] -- _Doesn't log performance metrics_
+ - Lars created [[!mr 214]], but it's not merged yet. Alexander made
+ good suggestions for tidying up the code, but Lars failed to make
+ them work, and then got distracted by performance investigations.
+ They can be worked on later.
+* [[!issue 176]] -- _Doesn't report version with much detail_
+ - Lars didn't work on this after all.
+* [[!issue 180]] -- _Chunk metadata should be in AAD, not in headers?_
+ - Lars didn't work on this after all.
+
+
+The iteration ran over a few weeks, mostly due to the northern
+hemisphere Darkness affecting Lars's ability to be productive, and
+also Lars got distracted by looking at improving Obnam performance.
+
+# Discussion
+
+## Current development theme
+
+The current theme of development for Obnam is performance, because
+that is currently Lars's primary worry. The choices are performance,
+security, convenience, at least currently.
+
+## Performance
+
+Lars has been investigating where Obnam performance bottlenecks are,
+by running benchmarks, and looking at profiling results from [cargo
+flamegraph][]. For an Obnam run with a good number of files that
+haven't changed, most of the time in Obnam goes into inserting rows
+into an SQLite database for the new generation. This led Lars to do
+some investigation into how fast he can make this happen.
+
+Lars wrote a little program that creates an SQLite database and the
+inserts a million rows into a table modelled after the Obnam `files`
+table. The first, naive approach resulted in about 80,000 rows
+inserted per second on his laptop, and nearly 120,000 on his
+development server. After reading an [article by Jason Wyatt][] Lars
+then did the following changes:
+
+* use a single transaction for all million inserts
+* use the `rusqlite` prepared statement cache instead of preparing a
+ new statement for each insert
+
+The resulting speeds were (best speed of three runs, compiled in
+release mode, on development server with NVMe drives):
+
+program inserts/s
+-------- -----------
+individual-insert 117509
+individual-one-transaction 209512
+individual-prepared.rs 970874
+
+That's almost a million inserts per second. That'll do for now.
+
+Another approach might be to modify a copy of the previous generation,
+but the logic gets trickier than with the approach of starting with an
+empty database and inserting what we find in live data.
+
+Lars also looked at what it would take to change the current Obnam
+abstractions around SQLite to use the approach used above. He feels
+the Obnam abstractions he wrote originally are messy and could do with
+a better abstraction. He intends to work on that in the new iteration.
+
+[cargo flamegraph]: https://crates.io/crates/flamegraph
+[article by Jason Wyatt]: https://medium.com/@JasonWyatt/squeezing-performance-from-sqlite-insertions-971aff98eef2
+
+# Repository review
+
+Lars didn't review any issues, merge requests, or CI pipelines this
+time. He wants to work on database abstractions first.
+
+# Goals
+
+## Goal for 1.0 (not changed this iteration)
+
+The goal for version 1.0 is for Obnam to be an utterly boring backup
+solution for Linux command line users. It should just work, be
+performant, secure, and well-documented.
+
+It is not a goal for version 1.0 to have been ported to other
+operating systems, but if there are volunteers to do that, and to
+commit to supporting their port, ports will be welcome.
+
+Other user interfaces is likely to happen only after 1.0.
+
+The server component will support multiple clients in a way that
+doesn’t let them see each other’s data. It is not a goal for clients
+to be able to share data, even if the clients trust each other.
+
+## Goal for the next few iterations (not changed for this iteration)
+
+The goal for next few iterations is to have Obnam be performant. This
+will include, at least, making the client use more concurrency so that
+it can use more CPU cores to compute checksums for de-duplication.
+
+## Goal for this iteration (new for this iteration)
+
+The goal for this iteration is to tidy up database abstraction code in
+the Obnam client and implement the performance improvements Lars
+did prototype code for.
+
+# Commitments for this iteration
+
+Lars will work on Obnam client database abstractions and performance.
+The goal for these is for Obnam to be able to run `obnam backup` on a
+live data set of a million files that haven't changed since the
+previous backup in less than 30 seconds, on Lars's development server.
+
+This work is not captured in issues.
+
+# Meeting participants
+
+* Lars Wirzenius