[[!meta title="Locking in Obnam repositories"]] Obnam supports multiple clients backing up to the same repository at the same time. It further allows the clients to share data, so that if Alice has backed up a large file, and Bob has the same file, he does not need to actually upload the file a second time to the backup repository. See [[Obnam on-disk data structures|ondisk]] for more details. For Alice and Bob not to overwrite each others' changes to shared data, there needs to be some locking, so that only one of them can modify the data structures at the same time. The repository has six different classes of data: * metadata about the repository (shared, needs locking) - currently this is the version number of the repository format * a list of clients (shared, needs locking) * file data chunks (shared, but does not need locking) * encryption keys for chunks (shared, needs locking) * shared metadata about file chunks (shared, needs locking) * per-client file metadata (not shared, but does need locking) Another way to consider the data is as follows: * shared chunk storage * shared and per-client B-trees, and chunk encryption keys * repository metadata We can discuss the locking for each of these groups separately Shared chunk storage -------------------- The shared chunk storage consists of a directory tree, where chunks are stored in files named after the chunk identifier. When a client is making a backup, they only ever add chunks, and they do that by picking a chunk identifier by random, and creating a file with that name. If the file creation fails, the chunk id was already in use, and the client picks a new one by random. This means that the filesystem takes care of atomicity, and Obnam does not need to care about that. Adding chunks is a lock-free operation even when multiple clients are backing up at the same time. Read-only access to the chunks also does not require any locking, obviously. The operations that can delete chunks (forget, fsck) can also do without locking the chunk storage. They lock the relevant B-trees, so that other clients do not at the same time try to change the state of the repository, and this protects the shared chunks as well. When encryption is used, the encryption keys need to be updated for new clients, and this needs locking, but this is handled separately from the actual chunks. Shared and per-client B-trees, and chunk encryption keys -------------------------------------------------------- There are several shared B-trees: the list of clients, the mapping of chunk identifiers to chunk sums, and the mapping of chunk sums to chunk ids. There is one B-tree per client. From a locking point of view, they can be treated identically: if nothing else, even the per-client tree can be considered shared, when the backup server is doing a repository-wide fsck, for example. The encryption keys for chunk storage are not a B-tree, but they can be handled as if they chunks directory is a B-tree, from a locking point of view. Read-only access to the B-trees should not be locked. This is not entirely safe: Alice may do read-only access (e.g., during a restore), while Bob is doing a destructive operation (e.g., forget). However, the read-only client can treat any failures due to concurrent access the same way it treats other failures: it can try to restore as much as it can, or it can give up. Locking does not help against bad sectors on the repository. Not having to lock for read-only access will allow faster access and avoids having a long-running read-only operation locking out new backups. Destructive access will need locking. Thus, each B-tree will have a lock file at its root (called 'lock'): existence of the file indicates the tree is locked. Only one writer can have a lock at the same time. The client list B-tree only needs to be changed when a client is added or removed from the tree, or its encryption details are changed. The other shared B-trees need to be changed during every backup run of every client. The per-client B-trees need to be locked during backup and forget operations for that client. All B-trees will need to be locked while an fsck runs. (Per-client B-trees need to be locked to protect against concurrent backup, forget, and fsck operations, for example.) Each B-tree is locked separately, by creating a file called `lock` inside its directory. Repository metadata ------------------- The repository metadata (the directory `metadata` at the root of the repository, not to be confused with the similarly named files inside B-tree directories), may need to be locked against writes. For example, a future version of Obnam may provide a way to upgrade the repository format to a new version, and this requires locking against any changes to the repository. The repository metadata will be considered locked implicitly when the client list B-tree is locked. Avoiding deadlocks and reducing lock contention on shared stuff --------------------------------------------------------------- To avoid deadlocks, every client must create locks in the same order. 1. The client list B-tree. 1. Any per-client B-trees, in the order of their directory basenames. 1. Any other shared B-trees, in the order of their directory basenames. (Directory basenames are ordered byte by byte, using unsigned bytes.) There are several scenarios in which a client may require a lock: * adding a new client - lock client list, add new client to it - create, then lock per-client B-tree - lock the shared directories, then add the client's encryption key to them - unlock everything * making backups for a client - lock per-client B-tree, add new metadata - lock chunklist and chunksums B-trees, update with info about new chunks, unlock (possibly do this several times, for checkpoints) - unlock per-client B-tree * forgetting backups for a client - lock per-client B-tree, remove generations, gather list of chunks that may need to be deleted, unlock - lock chunklist and chunksums, update about chunks that aren't used by the client anymore, possibly remove chunks from disk, unlock * fsck for repository - lock everything, fsck, unlock everything When making backups, a client will pool its updates to the shared B-trees until it is ready to do all of them in a batch, typically at the end of a backup, or at a checkpoint generation. This way, it can keep the shared lock for only a short while. If the client crashes before it can update the shared B-trees, it is unfortunate, but not catastrophic: it will have stored some chunks in the repository that cannot be used for de-duplication, but no actual data is lost. Reliability and scalability testing of locking ---------------------------------------------- The reliability of locking will be tested by writing a script to do small backup generations, and running it concurrently many times against the same repository. The script will verify that each client can restore every generation successfully after all generations have been backed up. Also, repository wide fsck will be run. The amount of data to back up in each generation shall vary for each client, so that they are less likely to get into lockstep.