ondisk.mdwn


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212

[[!meta title="Obnam on-disk data structures"]]

Introduction
============

This page gives a high abstraction level picture of what the Obnam
repository data structures are like, and the internal abstraction for
handling them. It then links to more detailed descriptions of each
different repository format.

[[!toc levels=2]]


Constraints and assumptions
===========

The repository design is influenced by several constraints and
assumptions, some of which are described here.

Disk-like storage
-----------------

Storage is assumed to be disk-like, meaning any piece of data can be
accessed at about the same speed, rather than tape-like, where you'd
have to basically only append data, and where seeks are not practical.

Shared storage
--------------

Multiple backup clients may share storage, for lower cost or easier
administration.

Dumb storage
------------

The most important of assumption is that the repository storage is
dumb: it can't do any processing of data. Basically we must assume the
repository provides only the following operations:

* PUT some data in storage under a given filename, including a
  hierarchical directory structure. This is **atomic**, meaning the
  data is either completely available, or does not exist at all, under
  the given name. Stored data may be replaced completely, but it can't
  be updated only partly. PUT may optionally fail, if the name is
  already in use.
* GET named data from storage.
* LIST all data in storage in a given directory.
* DELETE named data from storage.

We can't, for example, request that the storage compute a checksum of
some data. We especially can't assume Obnam itself is running on the
machine providing the storage.

Storage filesystem
------------------

The storage may be provided by any existing filesystem, including
VFAT, or it might not be provided by a real filesystem at all. We
can't use neat tricks like hard links to implement the repository.

Round trip time
---------------

The storage may be a local filesystem, but it may also be access over
the network using some protocol such as SFTP. This means every storage
access potentially carries a large time overhead. Minimising the
number of separate accesses is necessary for good performance.

Security and privacy
--------

The repository design must assume an attacker has at least read-only
access to the repository. This means the design should avoid leaking
information via filenames, or other such things. Some data leak is
unavoidable: it is, for example, unavoidable that an attacker can keep
track of which files were changed when.


Repository structure
====================

The repository is divided into four kinds of areas:

* A list of clients.
* Chunks of file content.
* Indexes to find chunks efficiently.
* Per-client data, such as what files each client has.

These areas are mostly independent of each other. They refer to
objects using identifiers: clients and chunks have identifiers that
are random 64-bit integers, to avoid data leaks.


Client list
-----------

Each backup client needs to add itself to the client list. The client
list maps the client name to the client identifier, and also lists the
client's encryption key.


Chunks of file content
----------------------

Files vary in size a lot, and thus Obnam breaks file content into
suitably small pieces, called chunks. Chunks can be re-used between
files and clients, for de-duplication. Chunks may be of variable size.
Chunks are accessed using their identifier only.


Chunk indexes
-------------

For de-duplication, it is necessary to know if a given piece of file
content data is already stored in the repository. Chunk indexes
provide a mapping from the value of a checksum algorithm to a list of
chunk identifiers whose content has that checksum.

Additionally, when removing backup generations (`obnam forget`), it is
necessary to know which clients are using a given chunk. Chunk indexes
also provide a mapping from a chunk identifier to a list of client
identifiers.


Per-client data
---------------

Each client has its own files, and manages its own backup history. For
each client the repository has its own area, where the client stores:

* each backup generation
* all the names and metadata (`stat`(2) results, etc) for each file
* possibly some other data that is only relevant for that client


Feature implementation
======================

Some of the headline features of Obnam need to be implementable using
the repository design. This section describes how that happens.


De-duplication
--------------

De-duplication is implemented by the backup process reading a file and
splitting the content up in chunks, using whatever chunking method it
chooses. It then looks up the chunk in the chunk indexes to see if the
chunk content is already in the repository.

If there are chunks with the same checksum, the backup process can
then either decide to re-use the chunks, on the assumption that the
checksum is strong enough and that there are no collisions.
Alternatively, it can download each of the chunks from the repository
and compare the data bit by bit, to verify a match. The latter is
quite expensive in time and bandwidth, but necessary for those who
can't rely on checksums, such as those researching checksum
collisions.

Encryption
----------

Storage is dumb, and so it doesn't encrypt itself, or at least Obnam
can't assume it does. Further, storage may be controlled by someone
else, such as an online storage provider, and while they may be
trusted to store data, they can't be trusted to not leak data. Thus,
Obnam encrypts data before putting it in storage (assuming the user
enables encryption).

Encryption is done by combining symmetric and asymmetric (public key)
encryption. For each area (client list, chunks, chunk indexes,
per-client), a random symmetric encryption key is created, and all
other files in the area are encrypted using that key. The symmetric
key itself is encrypted using a list of public keys, which are
provided by the user. The public keys are stored in a file in the
area, and that file is encrypted using the symmetric key.

This way, to access the public keys, you must be able to access the
symmetric key, and you can only do that if you have one of the public
keys. Or if you can break the encryption.


Internal implementation details
===============================

Internally, in the Obnam code base, all repository formats are
accessed using the same interface. This is so that Obnam can support
any number of repository formats without excessive code complexity. In
the code, this is in the `obnamlib/repo_interface.py` file. For
details,
[see that file](http://git.liw.fi/cgi-bin/cgit/cgit.cgi/obnam/tree/obnamlib/repo_interface.py).

Additionally, all live data and repository storage is accessed using a
filesystem abstraction, called the VFS interface. This interface is
implemented separately for local filesystems and for SFTP servers, and
this enables Obnam to access live data as well over SFTP. In the
source tree, see `obnamlib/vfs.py` for the interface. Additional
storage providers can be added by implementing the interface for them.


Specific repository formats
===========================

Obnam implements several repository formats. Each format implements
the abstract features described above in some concrete way. Depending
on the quality of the implementation, the resulting format can work
better or worse.

* [[Format **6**|format-6]] was introduced prior to version 1.0, and
  is currently the main format. It is intended for real use.
* Format [[Green Albatross|format-green-albatross]] is an experimental
  format, intended to eventually replace format 6.