diff options
Diffstat (limited to 'encryption.mdwn')
-rw-r--r-- | encryption.mdwn | 384 |
1 files changed, 0 insertions, 384 deletions
diff --git a/encryption.mdwn b/encryption.mdwn deleted file mode 100644 index 7c8ba64..0000000 --- a/encryption.mdwn +++ /dev/null @@ -1,384 +0,0 @@ -[[!meta title="Encryption support in Obnam"]] - -Obnam needs to support encryption of the backup store. This document -describes requirements for and design of how Obnam will be using -encryption. It is currently a **DRAFT** and feedback is very much -welcome. - -[[!toc ]] - - -Goal ----- - -The goal of encryption in Obnam is to avoid unauthorized parties from -gaining access to backed up data. If the backup store is on a server -on the Internet, and someone breaks into the server, the data should -be safe from prying eyes. Likewise, if the backup store is on a USB -hard disk, and the disk gets stolen, the data should not leak. - -This document does not worry about the data as it is being transferred. -The data is encrypted on the machine running obnam, -which reads the live data and stores it in a backup store, -and there may be up to three machines involved, and the links between -them are assumed to be over sftp or another sensible protocol. -(In other words, obnam may be running on one machine, the live data -may be on another one, and the backup store on a third one.) - -For the purpose of this document, the server hosting the backup store -is considered to be unauthorized to access the data, or do anything -to the data except store it. This means that if the server needs to -things like removing unwanted backup generations, it needs to be -explicitly authorized to do so, by giving it encryption keys. By -default, it should not have that access, and will see only encrypted -data. - - -Requirements ------------- - -* All data in the backup store MUST be encrypted. - Filenames are not encrypted, but Obnam uses random filenames whenever - it can already. -* The backup store does not need to store any backup keys. It MUST be - enough for the keys to be stored in the clients only. -* For disaster recovery, it may be necessary to be able to access the - backup store only using a passphrase. This should be supported by the - design, but use of this feature MUST be optional. -* Clients can replace public keys used with the backup store at any time, - without having to re-backup anything. After keys have been replaced, the - old keys MUST NOT be able to access data in the backup store. -* The design MUST support use of multiple keys. Each client MUST be able to - use their own key, and disallow anyone else from accessing their data, - even if the backup store is shared. It should be possible for the server - to have its own key that it uses for forgetting old generations, running - fsck on the entire store, and so on. - -The encryption keys for accessing the backup store are not used for -authenticating access to the backup store, to allow flexibility in -selecting store providers. - - -Encryption methods ------------------- - -There are two relevant encryption methods for Obnam: - -* public key cryptography -* symmetric cryptography - -With public keys, there is a optional passphrase for the secret key. -With symmetric encryption, the passphrase is the key. -The user needs to manage the secret key and its passphrase in a suitable -manner. - - -Design ------- - -All data in the backup store is stored either in B-tree nodes, -or in a special directory tree for chunks of file data. Each B-tree -is stored in its own directory tree, and each node is stored in its -own file. We'll call each of these directory -trees a _toplevel_. Thus the root of a backup store consists of some -number of toplevel directories. - -The backup store is used by some set of clients, plus perhaps the -backup server. We'll call each of these a user. Each user may have -its own public/private key pair, or they may be shared between -clients. This is up to the backup admins. The design assumes they -all have unique key pairs. - -Some toplevels are private for a particular client. Others are -shared. - -Every file in a toplevel is encrypted using symmetric encryption. -The symmetric encryption key is encrypted with the public -key of every user who needs access to the toplevel. In other words, -to have access to contents of the toplevel, a user needs to be able -to decrypt the symmetric encryption key using their own key. - -To stop a user from having access to a toplevel, the symmetric encryption key -file gets re-encrypted with every other key except the unwanted one. -This does not prevent an attacker who has previously stored a -local copy of the decrypted symmetric encryption key. To stop that, every file -in the toplevel needs to be re-encrypted with a new symmetric -encryption key. - -The encrypted symmetric encryption key is stored inside -the toplevel, using a well-known name. This is the only file not -encrypted with the symmetric encryption key. - -The public keys for users who should have access to the toplevel -are also stored inside the toplevel, in a file encrypted with the -symmetric encryption key. - -The symmetric encryption keys are generated randomly, -and are of sufficient length that brute-forcing them will not be -realistic. Perhaps something like 256 bits from /dev/random. - -To give a user access to the repository, the repository admin needs -to add the user's public key to the shared toplevels: client list, -chunks, and chunk checksum data. A client may remove itself from -the repository, or an admin may do that. "Admin" in this context -is anyone with access to all the shared toplevels. - - -Example -------- - - chunks/ -- toplevel for file data chunks - key -- symmetric encryption key - userkeys -- public keys of all users of this toplevel - * -- all other files - -To set up toplevel for encryption: - -* generate a suitable amount of random data to use as symmetric encryption key -* create list of public keys to have access to toplevel -* encrypt public key list using symmetric encryption key -* upload encrypted public key list -* encrypt symmetric encryption key with every public key in list -* upload encrypted symmetric encryption key - -To add a file to the toplevel: - -* download `key` file -* decrypt `key` file using user's private key, get key text -* encrypt file using symmetric encryption key -* upload encrypted file - -To use a file in the toplevel: - -* download `key` file -* decrypt `key` using user's private key -* download desired file -* decrypt file using symmetric encryption key - - -Discussion ----------- - -The simplest approach would be to only use public key encryption, -but this makes it difficult to change the keys. Changing the keys -is necessary to handle scenarios like giving access to the shared -toplevels to a new user with a new key pair. Otherwise the -symmetric encryption key -needs to be distributed to every user, and re-distributed -if it ever changes, and this is cumbersome. It would also be possible -to re-encrypt everything in the toplevel for every new user, but -that is laughably inefficient. However, it would be acceptably -simple to support the scenario of distributing the symmetric encryption key to -every user, if the backup admin thinks storing it on the backup -server even in encrypted form is too risky. - -I have removed [[data signing|signing]] from this spec, on the suggestion of -Daniel Kahn Gilmor. Data signing will be dealt with separately. - -I am going to assume that any public keys being used are generated -by the user, not by obnam. - -I am not an encryption expert. I will not be implementing my own -encryption code, and do not even want to choose the specific -algorithms or key formats. I will be using GnuPG for all encryption -operations, because it is well-known and well-respected, and lets -me outsource all thinking. - - -Implementation outline ----------------------- - -General repository I/O operations (these correspond to the -`mkdir`, `write_file`, and `cat` operations in the Obnam VFS layer): - - def repo_mkdir(pathname): - # create a new directory - - def repo_write_file(pathname, contents): - # write a file to the repository (pathname is relative to repo) - - def repo_read_file(pathname): - # read contents of a file in the repository (name is relative to repo) - -General encryption routines: - - def generate_symmetric_key(): - # return N random bits to be used as a symmetric encryption key - - def encrypt_with_symmetric_key(data, symmetric_key): - # return data encrypted using symmetric encryption - - def decrypt_with_symmetric_key(encrypted, symmetric_key): - # return data after it has been decrypted using symmetric encryption - - def encrypt_with_pubkeys(data, pubkeys): - # return data after it has been encrypted for all of the given - # public keys - - def decrypt_with_secret_key(encrypted, secret_key): - # decrypt encrypted data using a secret key; this will fail unless - # the data was encrypted using the public key corresponding to the - # the secret key - -Keyring handling in memory: - - def create_empty_keyring(): - ... - - def add_to_keyring(keyring, key): - ... - - def keyring_contains(keyring, key): - ... - - def remove_from_keyring(keyring, key): - ... - - def encode_keyring(keyring): - # Return form of keyring that can be stored on disk. - - def decode_keyring(encoded): - # Inverse of encode_keyring. - -Create a new toplevel: - - def create_toplevel(name, pubkeys): - repo_mkdir(name) - - symmetric_key = generate_symmetric_key() - encrypted = encrypt_with_pubkeys(symmetric_key, pubkeys) - repo_write_file(name + '/key', encrypted) - - keyring = create_empty_keyring() - for pubkey in pubkeys: - add_to_keyring(keyring, pubkey) - encoded = encode_keyring(keyring) - encrypted = encrypt_symmetric(encoded, symmetric_key) - repo_write_file(name + '/userkeys', encrypted) - -Reading and writing files in a toplevel: - - def get_symmetric_key(toplevel, secret_key): - encoded = repo_read_file(toplevel + '/key') - return decrypt_with_secret_key(encoded, secret_key) - - def toplevel_read_file(toplevel, filename, secret_key): - symmetric_key = get_symmetric_key(toplevel, secret_key) - encoded = repo_read_file(toplevel + '/' + filename) - return decrypt_with_symmetric_key(encoded, symmetric_key) - - def toplevel_write_file(toplevel, filename, cleartext, secret_key): - symmetric_key = get_symmetric_key(toplevel, secret_key) - encoded = encrypt_with_symmetric_key(cleartext, symmetric_key) - repo_write_file(toplevel + '/' + filename, encoded) - -Manage keys for a toplevel: - - def read_keyring(toplevel, name, secret_key): - encoded = toplevel_read_file(toplevel, name, secret_key) - return decode_keyring(encoded) - - def write_keyring(toplevel, name, keyring, secret_key): - encoded = encode_keyring(keyring) - toplevel_write_file(toplevel, name, encoded, secret_key) - - def add_to_userkeys(toplevel, public_key, secret_key): - userkeys = read_keyring(toplevel, 'userkeys', secret_key) - if not keyring_contains(userkeys, public_key): - add_to_keyring(userkeys, public_key) - write_keyring(toplevel, 'userkeys', userkeys, secret_key) - - def remove_from_userkeys(toplevel, public_key, secret_key): - userkeys = read_keyring(toplevel, 'userkeys', secret_key) - if keyring_contains(userkeys, public_key): - remove_from_keyring(userkeys, public_key) - write_keyring(toplevel, 'userkeys', userkeys, secret_key) - -Repository client management: - - def add_client(client_public_key, admin_secret_key): - add_to_userkeys('metadata', client_public_key, admin_secret_key) - add_to_userkeys('clientlist', client_public_key, admin_secret_key) - add_to_userkeys('chunks', client_public_key, admin_secret_key) - add_to_userkeys('chunksums', client_public_key, admin_secret_key) - # client will add itself to the clientlist and create its own toplevel - - def remove_client(client_public_key, admin_secret_key): - # client may remove itself, since it has access to the symmetric keys - # we assume the client-specific toplevel has already been removed - remove_from_userkeys('chunksums', client_public_key, admin_secret_key) - remove_from_userkeys('chunks', client_public_key, admin_secret_key) - remove_from_userkeys('clientlist', client_public_key, admin_secret_key) - remove_from_userkeys('metadata', client_public_key, admin_secret_key) - - -Hooks in Obnam --------------- - -Obnam's Repository class needs to have a pair of hooks for modifying -data before it gets written to the repository, and after it has been -read. These modifications should be each other's inverse functions. -Apart from encryption, these hooks could be used for error correction -codes for data in the store, and perhaps other things. The repository -should just provide and call the hooks, and not otherwise concern -itself with encryption. - -These hooks are not needed at the VFS layer, since it is not necessary -to decrypt live data, nor encrypt data that is being restored. - -The hooks correspond to `create_toplevel`, `toplevel_read_file`, -and `toplevel_write_file` above. However, to allow chains of callbacks -for the hooks, instead of the encryption callback writing the -data out to the repository, it should return it instead. The next -callback will get the encrypted data, and add, say, error correction -codes to it. Finally, when all callbacks are done, the encrypted and -error-corrected blob gets written to the repository. - -Thus, Repository should provide the following hooks: - -* `repo-create-toplevel(name)`: called whenever the repository has created - a new toplevel directory -* `repo-write(toplevel, filename, data)`: called by the repository - prepare data to be written to the repository -* `repo-read(toplevel, filename, data)`: called by repository for data that - has been read from the repository - -The hook subsystem needs to have a way to order callbacks, and for -each callback to return a modified form of the data for the next -callback to process (instead of the next callback processing the -original data). - -Callback ordering is important so that encryption always happens -before ECC encoding: there's no point in ECC if it happens before -encryption. - -Since the universe of likely Obnam plugins is small, and it can be -assumed that plugin authors co-operate, we can achieve ordering -most simply by having an optional integer _order_ argument to -the hook registration method. - - def add_callback(self, name, callback, order=None): - -Any callbacks without an explicit order will be put at the end -of the callback chain, and all others to the beginning, sorted -by the order argument into increasing order. - -We can the arrange for the callback registrations for encryption -and ECC to use appropriate ordering: - - hooks.add_callback('repo-write', encrypt, order=1000) - hooks.add_callback('repo-write', ecc, order=2000) - -(Reverse the ordering for `repo-read`, of course.) - -Arranging for a hook to be able to modify the data is a bit -trickier. Ideally, it could just return the new data, but the -general purpose nature of the hook subsystem means that it does -not know what the arguments for a hook are. - -Thanks ------- - -Thank you to Richard Braakman, Peter Palfrader, Jaakko Niemi, -and Daniel Kahn Gillmor for feedback. Any problems that remain -are my fault. |