locking.mdwn


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158

[[!meta title="Locking in Obnam repositories"]]

Obnam supports multiple clients backing up to the same repository at
the same time. It further allows the clients to share data, so that
if Alice has backed up a large file, and Bob has the same file, he
does not need to actually upload the file a second time to the backup
repository. See [[Obnam on-disk data structures|ondisk]] for more details.

For Alice and Bob not to overwrite each others' changes to shared
data, there needs to be some locking, so that only one of them can
modify the data structures at the same time.

The repository has six different classes of data:

* metadata about the repository (shared, needs locking)
  - currently this is the version number of the repository format
* a list of clients (shared, needs locking)
* file data chunks (shared, but does not need locking)
* encryption keys for chunks (shared, needs locking)
* shared metadata about file chunks (shared, needs locking)
* per-client file metadata (not shared, but does need locking)

Another way to consider the data is as follows:

* shared chunk storage
* shared and per-client B-trees, and chunk encryption keys
* repository metadata

We can discuss the locking for each of these groups separately

Shared chunk storage
--------------------

The shared chunk storage consists of a directory tree, where chunks are
stored in files named after the chunk identifier. When a client is 
making a backup, they only ever add chunks, and they do that by picking
a chunk identifier by random, and creating a file with that name. If
the file creation fails, the chunk id was already in use, and the client
picks a new one by random.

This means that the filesystem takes care of atomicity, and Obnam does
not need to care about that. Adding chunks is a lock-free operation even
when multiple clients are backing up at the same time.

Read-only access to the chunks also does not require any locking,
obviously.

The operations that can delete chunks (forget, fsck) can also do without
locking the chunk storage. They lock the relevant B-trees, so that other
clients do not at the same time try to change the state of the repository,
and this protects the shared chunks as well.

When encryption is used, the encryption keys need to be updated for new
clients, and this needs locking, but this is handled separately from the
actual chunks.

Shared and per-client B-trees, and chunk encryption keys
--------------------------------------------------------

There are several shared B-trees: the list of clients, the mapping of
chunk identifiers to chunk sums, and the mapping of chunk sums to
chunk ids. There is one B-tree per client. From a locking point of
view, they can be treated identically: if nothing else, even the
per-client tree can be considered shared, when the backup server is
doing a repository-wide fsck, for example.

The encryption keys for chunk storage are not a B-tree, but they can be
handled as if they chunks directory is a B-tree, from a locking point of
view.

Read-only access to the B-trees should not be locked. This is not entirely
safe: Alice may do read-only access (e.g., during a restore), while
Bob is doing a destructive operation (e.g., forget). However, the
read-only client can treat any failures due to concurrent access
the same way it treats other failures: it can try to restore as much
as it can, or it can give up. Locking does not help against bad sectors
on the repository. Not having to lock for read-only access will allow
faster access and avoids having a long-running read-only operation locking
out new backups.

Destructive access will need locking. Thus, each B-tree will have a
lock file at its root (called 'lock'): existence of the file indicates
the tree is locked. Only one writer can have a lock at the same time.

The client list B-tree only needs to be changed when a client is added
or removed from the tree, or its encryption details are changed. The
other shared B-trees need to be changed during every backup run of every
client. The per-client B-trees need to be locked during backup and forget
operations for that client. All B-trees will need to be locked while
an fsck runs. (Per-client B-trees need to be locked to protect against
concurrent backup, forget, and fsck operations, for example.)

Each B-tree is locked separately, by creating a file called `lock`
inside its directory.

Repository metadata
-------------------

The repository metadata (the directory `metadata` at the root of
the repository, not to be confused with the similarly named files inside
B-tree directories), may need to be locked against writes. For example,
a future version of Obnam may provide a way to upgrade the repository
format to a new version, and this requires locking against any changes
to the repository.

The repository metadata will be considered locked implicitly when the
client list B-tree is locked.

Avoiding deadlocks and reducing lock contention on shared stuff
---------------------------------------------------------------

To avoid deadlocks, every client must create locks in the same order.

1. The client list B-tree.
1. Any per-client B-trees, in the order of their directory basenames.
1. Any other shared B-trees, in the order of their directory basenames.

(Directory basenames are ordered byte by byte, using unsigned bytes.)

There are several scenarios in which a client may require a lock:

* adding a new client
  - lock client list, add new client to it
  - create, then lock per-client B-tree
  - lock the shared directories, then add the client's encryption key to them
  - unlock everything
* making backups for a client
  - lock per-client B-tree, add new metadata
  - lock chunklist and chunksums B-trees, update with info about new chunks,
    unlock (possibly do this several times, for checkpoints)
  - unlock per-client B-tree
* forgetting backups for a client
  - lock per-client B-tree, remove generations, gather list of chunks
    that may need to be deleted, unlock
  - lock chunklist and chunksums, update about chunks that aren't used
    by the client anymore, possibly remove chunks from disk, unlock
* fsck for repository
  - lock everything, fsck, unlock everything

When making backups, a client will pool its updates to the shared B-trees
until it is ready to do all of them in a batch, typically at the end
of a backup, or at a checkpoint generation. This way, it can keep
the shared lock for only a short while. If the client crashes before
it can update the shared B-trees, it is unfortunate, but not 
catastrophic: it will have stored some chunks in the repository that
cannot be used for de-duplication, but no actual data is lost.

Reliability and scalability testing of locking
----------------------------------------------

The reliability of locking will be tested by writing a script to do
small backup generations, and running it concurrently many times 
against the same repository. The script will verify that each client
can restore every generation successfully after all generations have
been backed up. Also, repository wide fsck will be run.

The amount of data to back up in each generation shall vary for
each client, so that they are less likely to get into lockstep.