To: obnam-dev@obnam.org
From: Wladimir Palant <gtiobnam@palant.de>
Message-ID: <2d0a8c01-9f58-1ee7-7e20-53fe65d96718@palant.de>
Date: Mon, 3 Jul 2017 00:14:44 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.1.1
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Subject: [rfc] Passphrase-based encryption
Precedence: list
Sender: obnam-dev-bounces@obnam.org
Errors-To: obnam-dev-bounces@obnam.org

Hi,

with GPG being great and all that, I'd still prefer having the option to 
use a plain passphrase and AES encryption with obnam. IMHO, this 
approach has two advantages:

* Considerably simpler setup, you merely need to come up with a 
high-entropy passphrase.
* Much easier to back up - you don't need to worry about losing the 
passphrase due to a hard drive crash. If you are afraid of forgetting 
it, then writing it down and keeping somewhere safe will do.

It's particularly that second point which is important to me: if the GPG 
key / passphrase used to encrypt my backup are lost it becomes 
completely useless. With GPG I need to back up the encryption key 
separately, doing so securely tends to be rather complicated.

Sure, passphrases usually won't have the 256 bits of entropy necessary 
to take full advantage of AES-256. However, this doesn't matter much as 
long as they aren't easily guessable and a good (meaning slow) key 
derivation algorithm is used.

I'm currently trying out the following simple plugin to implement 
encryption via passphrase (please don't comment on code quality, this 
hasn't been polished):

> import hashlib
> import os
> 
> from Crypto.Cipher import AES

> class EncryptionPlugin(obnamlib.ObnamPlugin):
>     def enable(self):
>         self.tag = "encaespw"
> 
>         # There doesn't appear to be any "canonical" way to derive an AES key
>         # from a passphrase. There is OpenSSL's enc tool but it uses a very
>         # weak key derivation function (details under
>         # https://security.stackexchange.com/a/29139/4778). So let's just use
>         # PBKDF2 with a high number of iterations.
>         passphrase = os.environ['PASSPHRASE']
>         if not passphrase:
>             raise Exception('No encryption passphrase given')
> 
>         self.key = hashlib.pbkdf2_hmac('sha256', passphrase, 'aes key',
>                                        256 * 1024, dklen=32)
>         self.app.hooks.add_callback('repository-data', self, obnamlib.Hook.LATE_PRIORITY)
> 
> 
>     def filter_read(self, encrypted, repo, toplevel):
>         iv = encrypted[0:16]
>         return AES.new(self.key, AES.MODE_CFB, iv).decrypt(encrypted[16:])
> 
> 
>     def filter_write(self, cleartext, repo, toplevel):
>         iv = os.urandom(16)
>         return iv + AES.new(self.key, AES.MODE_CFB, iv).encrypt(cleartext)

It works nicely and IMHO similar functionality could be added to the 
official distribution. Notes:

* The passphrase is being passed in via an environment variable rather 
than command line parameters. While I am not a Linux expert, it's my 
understanding that this is a more secure approach - the command line can 
be seen by other users on the same computer, environment variables IMHO 
cannot be accessed.

* In my setup, the passphrase is mandatory (I don't want to create an 
unencrypted backup by mistake). In the official encryption plugin, there 
would rather be a command line option like 
--encryption-backend=passphrase to enable passphrase-based encryption. 
Also, the key size doesn't have to be hardcoded at 32 bytes (meaning 
AES-256), there can be an additional option like 
--encryption-algo=aes-128 allowing to specify other key sizes.

* I am currently using a hardcoded salt for PBKDF2. While not 
particularly bad (only relevant if a large number of encrypted obnam 
backups is being accessed by an unauthorized party), this isn't optimal 
either. One solution would be having a random salt for each file, but 
this would require deriving an individual key for each file and degrade 
performance. The other solution would be generating a unique random salt 
for each repository. This would create a single point of failure 
however, if the file storing that random salt gets corrupted the entire 
backup becomes unusable.

* The current encryption plugin will use /dev/random rather than 
/dev/urandom by default. This precaution might be justified when 
generating encryption keys, yet I'm only calling os.urandom() to 
generate the initialization vector. With a new initialization vector 
being generated for each encrypted file, polling /dev/random might be 
too slow here. Also, randomness of initialization vectors isn't as 
critical and doesn't justify such measures IMHO.

Any comments? I can write a patch if the general direction is approved.

regards
Wladimir

_______________________________________________
obnam-dev mailing list
obnam-dev@obnam.org
http://listmaster.pepperfish.net/cgi-bin/mailman/listinfo/obnam-dev-obnam.org