summaryrefslogtreecommitdiff
path: root/encryption.mdwn
diff options
context:
space:
mode:
Diffstat (limited to 'encryption.mdwn')
-rw-r--r--encryption.mdwn384
1 files changed, 0 insertions, 384 deletions
diff --git a/encryption.mdwn b/encryption.mdwn
deleted file mode 100644
index 7c8ba64..0000000
--- a/encryption.mdwn
+++ /dev/null
@@ -1,384 +0,0 @@
-[[!meta title="Encryption support in Obnam"]]
-
-Obnam needs to support encryption of the backup store. This document
-describes requirements for and design of how Obnam will be using
-encryption. It is currently a **DRAFT** and feedback is very much
-welcome.
-
-[[!toc ]]
-
-
-Goal
-----
-
-The goal of encryption in Obnam is to avoid unauthorized parties from
-gaining access to backed up data. If the backup store is on a server
-on the Internet, and someone breaks into the server, the data should
-be safe from prying eyes. Likewise, if the backup store is on a USB
-hard disk, and the disk gets stolen, the data should not leak.
-
-This document does not worry about the data as it is being transferred.
-The data is encrypted on the machine running obnam,
-which reads the live data and stores it in a backup store,
-and there may be up to three machines involved, and the links between
-them are assumed to be over sftp or another sensible protocol.
-(In other words, obnam may be running on one machine, the live data
-may be on another one, and the backup store on a third one.)
-
-For the purpose of this document, the server hosting the backup store
-is considered to be unauthorized to access the data, or do anything
-to the data except store it. This means that if the server needs to
-things like removing unwanted backup generations, it needs to be
-explicitly authorized to do so, by giving it encryption keys. By
-default, it should not have that access, and will see only encrypted
-data.
-
-
-Requirements
-------------
-
-* All data in the backup store MUST be encrypted.
- Filenames are not encrypted, but Obnam uses random filenames whenever
- it can already.
-* The backup store does not need to store any backup keys. It MUST be
- enough for the keys to be stored in the clients only.
-* For disaster recovery, it may be necessary to be able to access the
- backup store only using a passphrase. This should be supported by the
- design, but use of this feature MUST be optional.
-* Clients can replace public keys used with the backup store at any time,
- without having to re-backup anything. After keys have been replaced, the
- old keys MUST NOT be able to access data in the backup store.
-* The design MUST support use of multiple keys. Each client MUST be able to
- use their own key, and disallow anyone else from accessing their data,
- even if the backup store is shared. It should be possible for the server
- to have its own key that it uses for forgetting old generations, running
- fsck on the entire store, and so on.
-
-The encryption keys for accessing the backup store are not used for
-authenticating access to the backup store, to allow flexibility in
-selecting store providers.
-
-
-Encryption methods
-------------------
-
-There are two relevant encryption methods for Obnam:
-
-* public key cryptography
-* symmetric cryptography
-
-With public keys, there is a optional passphrase for the secret key.
-With symmetric encryption, the passphrase is the key.
-The user needs to manage the secret key and its passphrase in a suitable
-manner.
-
-
-Design
-------
-
-All data in the backup store is stored either in B-tree nodes,
-or in a special directory tree for chunks of file data. Each B-tree
-is stored in its own directory tree, and each node is stored in its
-own file. We'll call each of these directory
-trees a _toplevel_. Thus the root of a backup store consists of some
-number of toplevel directories.
-
-The backup store is used by some set of clients, plus perhaps the
-backup server. We'll call each of these a user. Each user may have
-its own public/private key pair, or they may be shared between
-clients. This is up to the backup admins. The design assumes they
-all have unique key pairs.
-
-Some toplevels are private for a particular client. Others are
-shared.
-
-Every file in a toplevel is encrypted using symmetric encryption.
-The symmetric encryption key is encrypted with the public
-key of every user who needs access to the toplevel. In other words,
-to have access to contents of the toplevel, a user needs to be able
-to decrypt the symmetric encryption key using their own key.
-
-To stop a user from having access to a toplevel, the symmetric encryption key
-file gets re-encrypted with every other key except the unwanted one.
-This does not prevent an attacker who has previously stored a
-local copy of the decrypted symmetric encryption key. To stop that, every file
-in the toplevel needs to be re-encrypted with a new symmetric
-encryption key.
-
-The encrypted symmetric encryption key is stored inside
-the toplevel, using a well-known name. This is the only file not
-encrypted with the symmetric encryption key.
-
-The public keys for users who should have access to the toplevel
-are also stored inside the toplevel, in a file encrypted with the
-symmetric encryption key.
-
-The symmetric encryption keys are generated randomly,
-and are of sufficient length that brute-forcing them will not be
-realistic. Perhaps something like 256 bits from /dev/random.
-
-To give a user access to the repository, the repository admin needs
-to add the user's public key to the shared toplevels: client list,
-chunks, and chunk checksum data. A client may remove itself from
-the repository, or an admin may do that. "Admin" in this context
-is anyone with access to all the shared toplevels.
-
-
-Example
--------
-
- chunks/ -- toplevel for file data chunks
- key -- symmetric encryption key
- userkeys -- public keys of all users of this toplevel
- * -- all other files
-
-To set up toplevel for encryption:
-
-* generate a suitable amount of random data to use as symmetric encryption key
-* create list of public keys to have access to toplevel
-* encrypt public key list using symmetric encryption key
-* upload encrypted public key list
-* encrypt symmetric encryption key with every public key in list
-* upload encrypted symmetric encryption key
-
-To add a file to the toplevel:
-
-* download `key` file
-* decrypt `key` file using user's private key, get key text
-* encrypt file using symmetric encryption key
-* upload encrypted file
-
-To use a file in the toplevel:
-
-* download `key` file
-* decrypt `key` using user's private key
-* download desired file
-* decrypt file using symmetric encryption key
-
-
-Discussion
-----------
-
-The simplest approach would be to only use public key encryption,
-but this makes it difficult to change the keys. Changing the keys
-is necessary to handle scenarios like giving access to the shared
-toplevels to a new user with a new key pair. Otherwise the
-symmetric encryption key
-needs to be distributed to every user, and re-distributed
-if it ever changes, and this is cumbersome. It would also be possible
-to re-encrypt everything in the toplevel for every new user, but
-that is laughably inefficient. However, it would be acceptably
-simple to support the scenario of distributing the symmetric encryption key to
-every user, if the backup admin thinks storing it on the backup
-server even in encrypted form is too risky.
-
-I have removed [[data signing|signing]] from this spec, on the suggestion of
-Daniel Kahn Gilmor. Data signing will be dealt with separately.
-
-I am going to assume that any public keys being used are generated
-by the user, not by obnam.
-
-I am not an encryption expert. I will not be implementing my own
-encryption code, and do not even want to choose the specific
-algorithms or key formats. I will be using GnuPG for all encryption
-operations, because it is well-known and well-respected, and lets
-me outsource all thinking.
-
-
-Implementation outline
-----------------------
-
-General repository I/O operations (these correspond to the
-`mkdir`, `write_file`, and `cat` operations in the Obnam VFS layer):
-
- def repo_mkdir(pathname):
- # create a new directory
-
- def repo_write_file(pathname, contents):
- # write a file to the repository (pathname is relative to repo)
-
- def repo_read_file(pathname):
- # read contents of a file in the repository (name is relative to repo)
-
-General encryption routines:
-
- def generate_symmetric_key():
- # return N random bits to be used as a symmetric encryption key
-
- def encrypt_with_symmetric_key(data, symmetric_key):
- # return data encrypted using symmetric encryption
-
- def decrypt_with_symmetric_key(encrypted, symmetric_key):
- # return data after it has been decrypted using symmetric encryption
-
- def encrypt_with_pubkeys(data, pubkeys):
- # return data after it has been encrypted for all of the given
- # public keys
-
- def decrypt_with_secret_key(encrypted, secret_key):
- # decrypt encrypted data using a secret key; this will fail unless
- # the data was encrypted using the public key corresponding to the
- # the secret key
-
-Keyring handling in memory:
-
- def create_empty_keyring():
- ...
-
- def add_to_keyring(keyring, key):
- ...
-
- def keyring_contains(keyring, key):
- ...
-
- def remove_from_keyring(keyring, key):
- ...
-
- def encode_keyring(keyring):
- # Return form of keyring that can be stored on disk.
-
- def decode_keyring(encoded):
- # Inverse of encode_keyring.
-
-Create a new toplevel:
-
- def create_toplevel(name, pubkeys):
- repo_mkdir(name)
-
- symmetric_key = generate_symmetric_key()
- encrypted = encrypt_with_pubkeys(symmetric_key, pubkeys)
- repo_write_file(name + '/key', encrypted)
-
- keyring = create_empty_keyring()
- for pubkey in pubkeys:
- add_to_keyring(keyring, pubkey)
- encoded = encode_keyring(keyring)
- encrypted = encrypt_symmetric(encoded, symmetric_key)
- repo_write_file(name + '/userkeys', encrypted)
-
-Reading and writing files in a toplevel:
-
- def get_symmetric_key(toplevel, secret_key):
- encoded = repo_read_file(toplevel + '/key')
- return decrypt_with_secret_key(encoded, secret_key)
-
- def toplevel_read_file(toplevel, filename, secret_key):
- symmetric_key = get_symmetric_key(toplevel, secret_key)
- encoded = repo_read_file(toplevel + '/' + filename)
- return decrypt_with_symmetric_key(encoded, symmetric_key)
-
- def toplevel_write_file(toplevel, filename, cleartext, secret_key):
- symmetric_key = get_symmetric_key(toplevel, secret_key)
- encoded = encrypt_with_symmetric_key(cleartext, symmetric_key)
- repo_write_file(toplevel + '/' + filename, encoded)
-
-Manage keys for a toplevel:
-
- def read_keyring(toplevel, name, secret_key):
- encoded = toplevel_read_file(toplevel, name, secret_key)
- return decode_keyring(encoded)
-
- def write_keyring(toplevel, name, keyring, secret_key):
- encoded = encode_keyring(keyring)
- toplevel_write_file(toplevel, name, encoded, secret_key)
-
- def add_to_userkeys(toplevel, public_key, secret_key):
- userkeys = read_keyring(toplevel, 'userkeys', secret_key)
- if not keyring_contains(userkeys, public_key):
- add_to_keyring(userkeys, public_key)
- write_keyring(toplevel, 'userkeys', userkeys, secret_key)
-
- def remove_from_userkeys(toplevel, public_key, secret_key):
- userkeys = read_keyring(toplevel, 'userkeys', secret_key)
- if keyring_contains(userkeys, public_key):
- remove_from_keyring(userkeys, public_key)
- write_keyring(toplevel, 'userkeys', userkeys, secret_key)
-
-Repository client management:
-
- def add_client(client_public_key, admin_secret_key):
- add_to_userkeys('metadata', client_public_key, admin_secret_key)
- add_to_userkeys('clientlist', client_public_key, admin_secret_key)
- add_to_userkeys('chunks', client_public_key, admin_secret_key)
- add_to_userkeys('chunksums', client_public_key, admin_secret_key)
- # client will add itself to the clientlist and create its own toplevel
-
- def remove_client(client_public_key, admin_secret_key):
- # client may remove itself, since it has access to the symmetric keys
- # we assume the client-specific toplevel has already been removed
- remove_from_userkeys('chunksums', client_public_key, admin_secret_key)
- remove_from_userkeys('chunks', client_public_key, admin_secret_key)
- remove_from_userkeys('clientlist', client_public_key, admin_secret_key)
- remove_from_userkeys('metadata', client_public_key, admin_secret_key)
-
-
-Hooks in Obnam
---------------
-
-Obnam's Repository class needs to have a pair of hooks for modifying
-data before it gets written to the repository, and after it has been
-read. These modifications should be each other's inverse functions.
-Apart from encryption, these hooks could be used for error correction
-codes for data in the store, and perhaps other things. The repository
-should just provide and call the hooks, and not otherwise concern
-itself with encryption.
-
-These hooks are not needed at the VFS layer, since it is not necessary
-to decrypt live data, nor encrypt data that is being restored.
-
-The hooks correspond to `create_toplevel`, `toplevel_read_file`,
-and `toplevel_write_file` above. However, to allow chains of callbacks
-for the hooks, instead of the encryption callback writing the
-data out to the repository, it should return it instead. The next
-callback will get the encrypted data, and add, say, error correction
-codes to it. Finally, when all callbacks are done, the encrypted and
-error-corrected blob gets written to the repository.
-
-Thus, Repository should provide the following hooks:
-
-* `repo-create-toplevel(name)`: called whenever the repository has created
- a new toplevel directory
-* `repo-write(toplevel, filename, data)`: called by the repository
- prepare data to be written to the repository
-* `repo-read(toplevel, filename, data)`: called by repository for data that
- has been read from the repository
-
-The hook subsystem needs to have a way to order callbacks, and for
-each callback to return a modified form of the data for the next
-callback to process (instead of the next callback processing the
-original data).
-
-Callback ordering is important so that encryption always happens
-before ECC encoding: there's no point in ECC if it happens before
-encryption.
-
-Since the universe of likely Obnam plugins is small, and it can be
-assumed that plugin authors co-operate, we can achieve ordering
-most simply by having an optional integer _order_ argument to
-the hook registration method.
-
- def add_callback(self, name, callback, order=None):
-
-Any callbacks without an explicit order will be put at the end
-of the callback chain, and all others to the beginning, sorted
-by the order argument into increasing order.
-
-We can the arrange for the callback registrations for encryption
-and ECC to use appropriate ordering:
-
- hooks.add_callback('repo-write', encrypt, order=1000)
- hooks.add_callback('repo-write', ecc, order=2000)
-
-(Reverse the ordering for `repo-read`, of course.)
-
-Arranging for a hook to be able to modify the data is a bit
-trickier. Ideally, it could just return the new data, but the
-general purpose nature of the hook subsystem means that it does
-not know what the arguments for a hook are.
-
-Thanks
-------
-
-Thank you to Richard Braakman, Peter Palfrader, Jaakko Niemi,
-and Daniel Kahn Gillmor for feedback. Any problems that remain
-are my fault.