summaryrefslogtreecommitdiff
path: root/blog/2021/04/25/meeting.md
blob: 6a5844d7ce5587ff7ec8c226675f4e756e9cf7f5 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
[[!meta title="Iteration planning: April 25—May 9"]]
[[!tag meeting]]

[[!toc levels=1]]

# Assessment of the iteration that has ended

[previous iteration]: /blog/2021/04/18/meeting

The goal for the [previous iteration][] was:

> The main goal of this iteration is to add at least rudimentary
> encryption of chunks, before they're uploaded to the server, and
> decryption and validation after they're downloaded. This should use
> the encryption keys stored by `obnam init`. This is done only if the
> client configuration says encryption is turned on, to allow an opt-in
> approach to encryption for now. Later on, encryption won't be
> optional.
>
> At the same time, work needs to start on using more concurrency in the
> client, and that means that Lars needs to learn more about Rust,
> specifically async Rust.
>
> Additionally, some smaller issues will be worked on, to tackle the
> back log of open issues.

That goal was not reached. Lars didn't get [[!issue 28]] (iterators
instead of large vectors), [[!issue 109]] (release), or [[!issue 110]]
(chunk encryption) done, partly due to life and work keeping him busy,
and partly because attempting to do [[!issue 28]] took up almost all
the time he'd allocated for Obnam. The issues Lars did work on have
outstanding merge requests, because of the multi-day review period
process being used. Further, Lars learned a bit about async Rust and
has a plan for concurrency in the client.

Those issues will be carried to the next iteration.

# Discussion

## Concurrency in the client

Lars experimented with async Rust, using
[tokio](https://crates.io/crates/tokio), on a branch of the
[Summain](https://crates.io/crates/summain) program. His conclusion is
that async will work well for those parts of programs that do I/O or
other blocking operations that allow switching between async tasks.
Such tasks are executed in tokio's worker threads. For CPU intensive
tasks, tokio's blocking tasks, executed in its blocking threads, work
fine.

Blocking tasks are necessary to use all the available CPU: tokio only
schedules normal tasks in a co-operative way, limiting the amount of
CPU cycles used for CPU intensive computation without inhibiting
progress of other tasks. Blocking tasks run in their own threads,
allowing more CPU to be used without interfering with normal async
task.

Lars was able to make a version of Summain where all the metadata
gathering of files is done in async tasks, but the checksum
computations are done in blocking tasks. This allowed all the CPU on
the test host used for checksum computation. The resulting program
was reasonably clear and obvious, a definite plus.

For Obnam, Lars proposes the following: the Obnam client will execute
in four phases:

1. Download the latest generation from server. Until this is done,
   nothing can be decided about whether a file needs to be backed up
   or not. If there are no generations (initial backup), invent a
   dummy, empty generation to be used instead.
   
2. Scan the file system to files to be backed up, excluding based on
   client configuration (e.g., `CACHEDIR.TAG`), and checking against
   the previous generation. Insert metadata of any not-excluded files
   into a new nascent generation. For each inserted file, mark it as
   unchanged, changed, or new.
   
3. Scan the nascent generation for regular files that are changed or
   new. For each such file, split it into chunks, compute the checksum
   for the chunk, and look up the checksum on the server or upload the
   chunk and insert the chunk id into the nascent generation.
   
4. Upload the nascent generation to the server.

Of these, phases 1 and 4 are synchronous, in that phase 1 must finish
before 2 can start, and phase 3 must finish before 4 can start;
however, otherwise they can use async Rust. Phases 2 and 3 are fully
async, and can be interleaved. The checksum computation part of phase
3 is a blocking task, to allow all the available CPU used for it.

## Iteration length

This was a one-week iteration. In principle it went OK, but Lars
failed to get much of the issues resolved he'd taken on. A longer
iteration would allow more flexibility, but can be problematic in
other ways.

A bigger issue is that if a change needs three days of review, by
default, a one-week iteration means all the work would need to be done
very early in the iteration, or else not enough time to get changes
reviewed.

For now, Lars suggests going back to two-week iterations.

## This iteration

This iteration covers April 25 though May 9. To allow participation
and feedback of the meeting plan, the meeting won't be closed until
Tuesday, April 27.


# Goals

## Goal for 1.0 (not changed this iteration)

The goal for version 1.0 is for Obnam to be an utterly boring backup
solution for Linux command line users. It should just work, be
performant, secure, and well-documented.

It is not a goal for version 1.0 to have been ported to other
operating systems, but if there are volunteers to do that, and to
commit to supporting their port, ports will be welcome.

Other user interfaces is likely to happen only after 1.0.

The server component will support multiple clients in a way that
doesn’t let them see each other’s data. It is not a goal for clients
to be able to share data, even if the clients trust each other.

## Goal for the next few iterations (not changed for this iteration)

The goal for next few iterations is to have Obnam support encryption
well. This will involve having a documented threat model, which has
been reviewed by all stakeholders participating in the project, and
Obnam defending against all the modeled threats.

## Goal for the iteration that is starting

The main goal of this iteration is to add at least rudimentary
encryption of chunks, before they’re uploaded to the server, and
decryption and validation after they’re downloaded. This should use
the encryption keys stored by `obnam init`. This is done only if the
client configuration says encryption is turned on, to allow an opt-in
approach to encryption for now. Later on, encryption won’t be
optional.

Additionally, work will continue on using iterators instead of
potentially enormous vectors when querying the SQLite database.

# Commitments for this iteration

New [[!milestone 9]] represents this iteration on GitLab.

This is a two-week iteration. For Lars, that means a time budget of
8 hours. Lars is committed to resolving the following issues:

- [[!issue 28]] - (2.75h; time boxed: give up if it takes longer)
- [[!issue 109]] - (0.25h)
- [[!issue 110]] - (1h)
- [[!issue 113]] - (4h)

That is a total of 8 hours, rough estimate. Additionally, finishing
off the merge requests from the April 18—25 iteration.


# Meeting participants

* Lars Wirzenius