roadmap-ga.mdwn


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76

[[!meta title="DRAFT: Roadmap for FORMET GREEN ALBATROSS"]]

This is a DRAFT roadmap for the new repository format (GREEN
ALBATROSS) to become the default format in Obnam. Feedback on this
roadmap is welcome via the obnam-devel mailing list.

Note that only things that affect the new repository format are
relevant for this roadmap. All other bugs or features are off-topic
and will not be included.

Success criteria for being done with the new format
========================================================================

* All major operations need to be tolerably fast, and run in 4 GiB of
  RAM. The test data set is to be the snapshot of almost 5 TiB data
  from my own file server. The backup repository should use encryption
  and compression.

  - backup
  - forget
  - restore (or possibly verifyy, to avoid needing another 5 TiB space)
  - fsck

* Obnam supports Attic-style chunking (lowest N bits of weak checksum
  means chunk ends), and things are still tolerably fast.

* The manual needs to be updated to cover all GA things and needs to
  have a comparison between 6 and GA, and also advice for converting
  to the new format ("start over" is sufficient, though a conversion
  tool would be nice).

* The obnam.org website needs to be reviewed, and any design docs
  updated.

* At least three months needs to have passed of actively asking Obnam
  users to use GA, without showstopper bugs being reported.

* No known need to change the new repository format.


Roadmap
========================================================================

This is the rough order of things that I know needs doing. There are
certainly things missing from the list, reality always wins over my
most careful planning.

* Add new benchmarks for all the success criteria. All the operations
  listed above should be benchmarked. Then analyze results and make
  any optimizations needed.

* Add Attic-style chunking. These chunks are not of fixed size at
  fixed positions in the files. This matters to the repository format
  because there may be lots more chunks, depending on settings, and
  the format needs to handle that.

* Add sparse file handling to GA. Sparse files are not used by
  everyone, but those that do, really want them. Obnam currently
  doesn't handle them optimally and has not way of representing them
  in the repository except as long sequences of all-bits-zero bytes

* Make sure Obnam handles the case of an unknown username or group
  name of a file (only numeric uid or gid known). This is important
  when it's not feasible to get the user/group name from an SFTP
  server. This affects the repository format only a little, but it
  needs to allow storing a value to represent "not known", and all
  code needs to be to deal with that.

* Review obnam.org website and make any necessary updates.

* Review Obnam manual and make any necessary updates. Co-ordinate with
  translators to get non-English versions of the manual to also be
  updated.

* Make an Obnam release with a beta level version of GA, and ask
  people to use it and report results.