blog/2022/03/21/planning.mdwn


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239

[[!meta title="Iteration planning: March 21 &ndash; April 3"]]
[[!meta date="Wed, 21 Mar 2022 10:00:00 +0200"]]
[[!tag meeting]]

[[!toc levels=2]]

# Assessment of the iteration that has ended

[previous iteration]: /blog/2022/03/06/planning

The goal of the [previous iteration][] was:

> The goal for this iteration is to prepare for future schema changes.

This was completed. The Obnam client now supports more than one
version of the schema for backup generations, and can restore from any
of them. The server does not do that yet: if anything about it's
database schema changes, or it's API changes, the change is breaking
and necessitates starting over with a a new, empty repository with the
new server version. This will need to be addressed later. (See
[[!issue 199]].)

# Discussion

## Current development theme

The current theme of development for Obnam is **convenience**. The choices
are performance, security, convenience, and tidy-up, at least
currently.

## Breaking changes ahead

Lars foresees several upcoming breaking changes to how Obnam
encryption is done, client/server authentication, and more. Ideally
these would all be done in ways that don't require users to start over
with their backups, but it seems like a lot of effort for a short-term
gain. Thus, Lars intends to take advantage of the fact that nobody
uses Obnam yet and make some fundamental changes over the next few
iterations. Part of those changes will be to make easier to evolve
Obnam without redo-all-your-backups changes, but some will be so
fundamental that it doesn't seem worth supporting both the old and new
ways.

Lars plans the following such fundamental changes at the moment:

* Add a "trusted root object" to the Obnam system, to replace the
  current approach of "independent backup generation" objects. This
  will increase security, as well as make ordering of backup
  generations be more reliable.
  - this change is planned for this iteration
  - the old "generation chunk" approach will be dropped
* Add authentication to the client/server protocol. Details to be
  discusses later.
* Refactor the server to have database schema versioning.
* Add versioning to the client/server protocol.

After these, Lars hopes that Obnam will be in a state where it's
feasible to evolve the client and server mostly without the kind of
breaking changes that require starting over with an empty backup
repository.

## User root object

Currently, the Obnam client stores backup generations on the server
without an explicit ordering. Each generation has a timestamp, which
is used to sort the generations into an order, but that's not good
enough. See [[!issue 34]] (_Uses timestamps to order backup
generations_).

* If two backups run overlap in time, they might create new backups
  that are incremental to a common ancestor, but not related to each
  other. This will at minimum be confusing.
* There's no guarantee the backup with a later timestamp is actually
  newer: clock skew, and other errors, may affect things.
* Timestamps are cleartext data. This leaks information. Not good.
* Timestamps are not covered by a signature. Double-plus ungood. This
  allows an attacker to change them, and they can make the client
  think the latest backup is actually the oldest one. At minimum, this
  means further incremental backups may back up files needlessly, but
  may also mean the wrong backup gets restored.

Lars proposes a change to protect against this threat model:

* An attacker who can delete or modify files in the backup repository
  must not be able to alter the contents or ordering of backup
  generations.

To fix this, I'm thinking of the following approach:

* Each user has a "root object", which lists their backup generations,
  and metadata of each generation.
* The root object is a chunk, so it's encrypted and authenticated with
  AEAD. This prevents an attacker from modifying or inspecting it.
* The root object chunk has a random chunk id, but label "user" so
  that a client can easily find it. It is otherwise exactly like other
  chunks.
* Chunk metadata will be reduced to only `label`. The `generation` and
  `ended` metadata for chunks will be removed. This will force another
  breaking change, sorry.
* The root object will have a reference to the previous one.
* The client will find all root objects, and pick the newest one. This
  is because the client has no way to tell the server to delete any
  chunk, so it can't delete an old root object.
* Later, when we add client authentication, the server will store the
  data in the root object associated with the client account, and
  allow the client to update it. However, adding authentication is too
  big a change for this iteration.

A root object will contain could be serialized into JSON like this:

~~~json
{
    "client": "exolobe1",
    "previous_root": "6d381c04-a83a-11ec-a3e9-fba06bac23fd",
    "timestamp": "2022-03-20T09:07:17+00:00",
    "backups": [
        {
            "chunk-id": "7cb90434-a82d-11ec-9383-b31e0b1b81aa",
            "ended": "2022-03-20T09:07:17+00:00"
        },
        {
            "chunk-id": "7aa3681a-a82d-11ec-b24b-e3adb644891b",
            "ended": "2022-03-21T09:07:17+00:00"
        }
    ]
}
~~~

The `backups` field has the generation, in order, with the oldest one
first.

With this approach, everything linked from the root object chunk, or
found by following links further, can be assured to be in the right
order and to be unmodified.

An attacker can still replace the root object chunk with an older one.
This can be mitigated by checking the root object timestamp: if it's
unexpectedly old, something is wrong. A stronger mitigation would be
for the client to store the timestamp locally and check it on the next
backup run. However, that requires data to not be lost on the client
end, which is what backups are meant to protect against, so it's not a
very satisfactory solution.

If the limit for how old the root object chunk can be is too long, an
attacker can keep replacing the latest one with one that's as old as
it can be without trigger an alarm. That would mean that any
intervening backups get lost, which would be bad.

Attacks on the root object may need to be mitigated in future
iterations.


# Repository review

Lars reviewed all the open issues, merge requests, and CI pipelines
for all the projects in the Obnam group on gitlab.com.

## [Container Images](https://gitlab.com/obnam/container-image)

* Open issues: 0
* Merge requests: 0
* Additional branches: 0
* CI: OK, ran on Monday, March 14

## [obnam.org](https://gitlab.com/obnam/obnam.org)

* Open issues: 0
* Merge requests: 0
* Additional branches: 0
* CI: not defined

## [obnam-benchmark](https://gitlab.com/obnam/obnam-benchmark)

* Open issues: 11
* Merge requests: 0
* Additional branches: 0
* CI: not defined

## [summain](https://gitlab.com/obnam/summain)

* Open issues: 0
* Merge requests: 0
* Additional branches: 0
* CI: not defined

## [obnam](https://gitlab.com/obnam/obnam)

* Open issues: 54
* Merge requests: 2
  - [[!mr 214]] - _performance metrics_
    - needs thinking and further work
  - [[!mr 222]] - _add backup database schema to evolove; break server
    database_
    - to be merged on Tuesday
* Additional branches: 0
* CI: OK

# Goals

## Goal for 1.0 (not changed this iteration)

The goal for version 1.0 is for Obnam to be an utterly boring backup
solution for Linux command line users. It should just work, be
performant, secure, and well-documented.

It is not a goal for version 1.0 to have been ported to other
operating systems, but if there are volunteers to do that, and to
commit to supporting their port, ports will be welcome.

Other user interfaces is likely to happen only after 1.0.

The server component will support multiple clients in a way that
doesn’t let them see each other’s data. It is not a goal for clients
to be able to share data, even if the clients trust each other.

## Goal for the next few iterations (not changed for this iteration)

The goal for next few iterations is to have Obnam be easier and safer
to change, both for developers and end users. This means that
developers need to be able to make breaking changes without users
having to suffer. User shall be able to migrate their data, when they
feel it worthwhile, not just because there is a new version.

## Goal for this iteration (new for this iteration)

The goal of this iteration is to add a "root object" for a user's
backups, which lists the backup generations in order.

# Commitments for this iteration

Lars intends to work on the "root object" change, as described above.
This will affect, and hopefully resolve the following issues:

* [[!issue 34]] - _Uses timestamps to order backup generations_
* [[!issue 62]] - _Describe how chunks relate to each other_

# Meeting participants

* Lars Wirzenius