summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authordistix ticketing system <distix@pieni.net>2017-07-20 17:26:26 +0000
committerdistix ticketing system <distix@pieni.net>2017-07-20 17:26:26 +0000
commit1068df516756be76a9d0fc74a59b7c982ca5407a (patch)
tree1c3e27f67da3af52d2bd8d96ee42ffc05fd7abd4
parentaeedfba39b669df7d8c577f1f1fde7621259323e (diff)
downloadobnam-support-distix-1068df516756be76a9d0fc74a59b7c982ca5407a.tar.gz
imported mails
-rw-r--r--tickets/11c32688f6ae4c039e5fef65b5007a88/Maildir/new/1500571585.M471479P20604Q1.koom257
1 files changed, 257 insertions, 0 deletions
diff --git a/tickets/11c32688f6ae4c039e5fef65b5007a88/Maildir/new/1500571585.M471479P20604Q1.koom b/tickets/11c32688f6ae4c039e5fef65b5007a88/Maildir/new/1500571585.M471479P20604Q1.koom
new file mode 100644
index 0000000..9d05d07
--- /dev/null
+++ b/tickets/11c32688f6ae4c039e5fef65b5007a88/Maildir/new/1500571585.M471479P20604Q1.koom
@@ -0,0 +1,257 @@
+Return-Path: <obnam-support-bounces@obnam.org>
+X-Original-To: distix@pieni.net
+Delivered-To: distix@pieni.net
+Received: from yaffle.pepperfish.net (yaffle.pepperfish.net [88.99.213.221])
+ by pieni.net (Postfix) with ESMTPS id 3232A4378C
+ for <distix@pieni.net>; Thu, 20 Jul 2017 17:23:53 +0000 (UTC)
+Received: from platypus.pepperfish.net (unknown [10.112.101.20])
+ by yaffle.pepperfish.net (Postfix) with ESMTP id AFA25417A5;
+ Thu, 20 Jul 2017 18:23:52 +0100 (BST)
+Received: from ip6-localhost.nat ([::1] helo=platypus.pepperfish.net)
+ by platypus.pepperfish.net with esmtp (Exim 4.80 #2 (Debian))
+ id 1dYFB2-0003LW-MF; Thu, 20 Jul 2017 18:23:52 +0100
+Received: from [10.112.101.21] (helo=mx3.pepperfish.net)
+ by platypus.pepperfish.net with esmtps (Exim 4.80 #2 (Debian))
+ id 1dYFB1-0003LF-LY
+ for <obnam-support@obnam.org>; Thu, 20 Jul 2017 18:23:51 +0100
+Received: from barracuda.pco-inc.com ([71.4.36.131])
+ by mx3.pepperfish.net with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
+ (Exim 4.89) (envelope-from <lperkins@openeye.net>)
+ id 1dYFAz-0003VG-Fi
+ for obnam-support@obnam.org; Thu, 20 Jul 2017 18:23:51 +0100
+X-ASG-Debug-ID: 1500571418-0573a210922c6380001-phrF5L
+Received: from Loki.pcopen.net ([10.0.0.65]) by barracuda.pco-inc.com with
+ ESMTP id O2txcpxbBEzlTh7P (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384
+ bits=256 verify=NO); Thu, 20 Jul 2017 10:23:38 -0700 (PDT)
+X-Barracuda-Envelope-From: lperkins@openeye.net
+Received: from LOKI.pcopen.net ([fe80::39f5:aaff:14af:6002]) by
+ Loki.pcopen.net ([fe80::39f5:aaff:14af:6002%10]) with mapi id 14.03.0351.000;
+ Thu, 20 Jul 2017 10:23:37 -0700
+From: "Laurence Perkins (OE)" <lperkins@openeye.net>
+To: "liw@liw.fi" <liw@liw.fi>
+Thread-Topic: Variable Chunksize
+X-ASG-Orig-Subj: Re: Variable Chunksize
+Thread-Index: AQHTALO1js8PRLWMaEGpkNs8UOGlYqJb6R8AgAGEm4A=
+Date: Thu, 20 Jul 2017 17:23:37 +0000
+Message-ID: <1500571405.13826.8.camel@openeye.net>
+References: <1500484994.13826.5.camel@openeye.net>
+ <20170719181232.sdqihqdqldsgzmtd@liw.fi>
+In-Reply-To: <20170719181232.sdqihqdqldsgzmtd@liw.fi>
+Accept-Language: en-US
+Content-Language: en-US
+X-MS-Has-Attach: yes
+X-MS-TNEF-Correlator:
+x-originating-ip: [10.0.50.60]
+MIME-Version: 1.0
+X-Barracuda-Connect: UNKNOWN[10.0.0.65]
+X-Barracuda-Start-Time: 1500571418
+X-Barracuda-Encrypted: ECDHE-RSA-AES256-SHA384
+X-Barracuda-URL: https://10.0.0.6:443/cgi-mod/mark.cgi
+X-Barracuda-Scan-Msg-Size: 4950
+X-Virus-Scanned: by bsmtpd at pco-inc.com
+X-Barracuda-BRTS-Status: 1
+X-Barracuda-BRTS-Evidence: aea-obnam.org
+X-Barracuda-BRTS-Evidence: port-obnam.org
+X-Barracuda-Spam-Score: 0.00
+X-Barracuda-Spam-Status: No, SCORE=0.00 using global scores of TAG_LEVEL=1000.0
+ QUARANTINE_LEVEL=1000.0 KILL_LEVEL=5.0 tests=
+X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.3.41130
+ Rule breakdown below
+ pts rule name description
+ ---- ---------------------- --------------------------------------------------
+X-Pepperfish-Transaction: 2871-25ff-cc85-139e
+X-Spam-Score: -1.9
+X-Spam-Score-int: -18
+X-Spam-Bar: -
+X-Scanned-By: pepperfish.net, Thu, 20 Jul 2017 18:23:51 +0100
+X-Spam-Report: Content analysis details: (-1.9 points)
+ pts rule name description
+ ---- ---------------------- --------------------------------------------------
+ -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
+ [score: 0.0000]
+X-ACL-Warn: message may be spam
+X-Scan-Signature: ba526989c802b3f197cd2fc106aa2555
+Cc: "obnam-support@obnam.org" <obnam-support@obnam.org>
+Subject: Re: Variable Chunksize
+X-BeenThere: obnam-support@obnam.org
+X-Mailman-Version: 2.1.5
+Precedence: list
+List-Id: Obnam backup software discussion <obnam-support-obnam.org>
+List-Unsubscribe: <http://listmaster.pepperfish.net/cgi-bin/mailman/listinfo/obnam-support-obnam.org>,
+ <mailto:obnam-support-request@obnam.org?subject=unsubscribe>
+List-Archive: <http://listmaster.pepperfish.net/pipermail/obnam-support-obnam.org>
+List-Post: <mailto:obnam-support@obnam.org>
+List-Help: <mailto:obnam-support-request@obnam.org?subject=help>
+List-Subscribe: <http://listmaster.pepperfish.net/cgi-bin/mailman/listinfo/obnam-support-obnam.org>,
+ <mailto:obnam-support-request@obnam.org?subject=subscribe>
+Content-Type: multipart/mixed; boundary="===============0595421157618928757=="
+Mime-version: 1.0
+Sender: obnam-support-bounces@obnam.org
+Errors-To: obnam-support-bounces@obnam.org
+
+--===============0595421157618928757==
+Content-Language: en-US
+Content-Type: multipart/signed; micalg=pgp-sha512;
+ protocol="application/pgp-signature"; boundary="=-YIIC7W8/nGsEbIikuEZg"
+
+--=-YIIC7W8/nGsEbIikuEZg
+Content-Type: text/plain; charset="UTF-7"
+Content-Transfer-Encoding: quoted-printable
+
+Hmmm... it shouldn't actually increase the number of chunks on
+average. The ones that come out a little smaller will get balanced out
+by ones that come out a little bigger. I suppose a really unlucky
+dataset could end up with a whole pile of tiny chunks, but adding a
+minimum and maximum chunk size wouldn't change the algorithm that much,
+and would mitigate that problem.
+
+To illustrate, say we're storing a string of random digits 0-9.=20
+Statistically, if the set is large enough, splitting every ten
+characters should yield approximately the same number of chunks as
+splitting every time a 1 comes up. The bigger the repo, the more likely
+it is to come out the same.
+
+Since the data going in obviously won't be random, the key is to use a
+hash function where the output is (at least statistically).
+
+When I get some spare time I'll see if I can hack it into the chunking
+algorithm and test what happens with some real data sets. I suspect
+the real challenge will be finding a hash algorithm with sufficient
+entropy that can be computed that many times without a backup taking a
+month or having to resort to loading the calculations into the GPU...
+
+
+
+On Wed, 2017-07-19 at 21:12 +-0300, Lars Wirzenius wrote:
++AD4 This has, in fact, been discussed before, and it's my intention to
++AD4 look at implementing this once FORMAT GREEN ALBATROSS is finished.
++AD4 The
++AD4 issue is that variable-sized chunking may increase the number of
++AD4 chunks by a lot, and I'm not sure Obnam's data structures will deal
++AD4 with that sufficiently efficiently.
++AD4=20
++AD4 On Wed, Jul 19, 2017 at 05:23:16PM +-0000, Laurence Perkins (OE)
++AD4 wrote:
++AD4 +AD4 Stumbled across a couple of rsync/backup type programs that use
++AD4 +AD4 variable-sized chunks to improve backup performance.+AKAAoA-It mi=
+ght be
++AD4 +AD4 worth
++AD4 +AD4 considering adding as an option.
++AD4 +AD4=20
++AD4 +AD4 The concept is actually relatively simple.+AKAAoA-Instead of spli=
+tting
++AD4 +AD4 chunks
++AD4 +AD4 at fixed sizes, chunks are split when the data matches some
++AD4 +AD4 heuristic.+AKA
++AD4 +AD4 The common way to do it is to do a byte-by-byte hash of the strea=
+m
++AD4 +AD4 and
++AD4 +AD4 put in a chunk boundary wherever the hash meets some
++AD4 +AD4 criteria.+AKAAoA-With a
++AD4 +AD4 bit of statistics you can set what the average chunksize will be
++AD4 +AD4 just
++AD4 +AD4 by tweaking the criteria slightly.+AKAAoA(So, read say the first =
+1KB,
++AD4 +AD4 hash
++AD4 +AD4 it, then drop the first byte from the data to hash and add 1KB+-1=
+B
++AD4 +AD4 to
++AD4 +AD4 the end of the data to hash and repeat.+AKAAoA-You walk over the =
+data a
++AD4 +AD4 byte
++AD4 +AD4 at a time while still keeping a big enough amount of data going
++AD4 +AD4 into
++AD4 +AD4 the hash algorithm to produce interesting output.+AKAAoA-Split th=
+e chunk
++AD4 +AD4 when
++AD4 +AD4 the first X bytes of the hash are zero or whatever's convenient.)
++AD4 +AD4=20
++AD4 +AD4 Other than that, the rest of the chunking and indexing routines
++AD4 +AD4 work
++AD4 +AD4 the same way they do now.
++AD4 +AD4=20
++AD4 +AD4 The advantage is that adding a single byte to the start of a file
++AD4 +AD4 doesn't cause the entire file to be re-uploaded.+AKAAoA-It will j=
+ust
++AD4 +AD4 increase
++AD4 +AD4 the size of the first chunk by one byte and re-use the rest. (Or,
++AD4 +AD4 if
++AD4 +AD4 it's exactly in the chunk boundary, it will change the first two
++AD4 +AD4 chunks.+AKAAoA-But still, that's a lot less data to re-upload.)+A=
+KAAoA-It will
++AD4 +AD4 also
++AD4 +AD4 increase the amount of deduplication between nearly-identical
++AD4 +AD4 files.
++AD4 +AD4=20
++AD4 +AD4 The downside is the overhead involved in walking a hash algorithm
++AD4 +AD4 across the data, but it's not like you need a particularly CPU
++AD4 +AD4 intensive hash, or even any significant collision resistance, you
++AD4 +AD4 just
++AD4 +AD4 need one that produces enough entropy to get your chunk sizes in
++AD4 +AD4 the
++AD4 +AD4 realm you want.
++AD4 +AD4=20
++AD4 +AD4 This feature would significantly increase Obnam's deduplication
++AD4 +AD4 capabilities, without affecting the complexity of the datastore (=
+It
++AD4 +AD4 already supports multiple chunk sizes in the repo.)
++AD4 +AD4=20
++AD4 +AD4 I will take a look and see if I can figure out how to add this in=
+,
++AD4 +AD4 but
++AD4 +AD4 I'm not a particularly tidy coder on the best of days, so someone
++AD4 +AD4 with
++AD4 +AD4 a bit more Python design experience than I have might be quicker =
+at
++AD4 +AD4 it.
++AD4 +AD4 +AKA-Like...+AKAAoA-A lot quicker...
++AD4 +AD4=20
++AD4 +AD4 LMP
++AD4 +AD4 +AF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBf=
+AF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXw
++AD4 +AD4 obnam-support mailing list
++AD4 +AD4 obnam-support+AEA-obnam.org
++AD4 +AD4 http://listmaster.pepperfish.net/cgi-bin/mailman/listinfo/obnam-s=
+up
++AD4 +AD4 port-obnam.org
++AD4 +AD4=20
++AD4=20
++AD4=20
+--=-YIIC7W8/nGsEbIikuEZg
+Content-Type: application/pgp-signature; name="signature.asc"
+Content-Description: This is a digitally signed message part
+Content-Transfer-Encoding: 7bit
+
+-----BEGIN PGP SIGNATURE-----
+
+iQIzBAABCgAdFiEEFbYe3ereZkZxAoz7C4CSuysVUSAFAllw5w4ACgkQC4CSuysV
+USDBDQ//YKidqOuyaO2pXyZzGZCTs/kkKoFarI15mefwTe25GNqhFuhtZjCnmRcV
+hk8YbbymXhphQ9RbryWpGciTgWl9yomGxMWpnc5nyQKWUPYuR7pTFvT+8rb8/5Wl
+m0K6UAKryoHzsPzbpJ7Cq5OSx1ZJdBt/kplRRUCUgbw1H4Aw+87g0jwXGvvoJLVw
+rdeuny7dGL+IzyvRM3AXxkoqw5MBi3CJgzVLbjwA5wNVTn9mjQ+Wd1rdHBM1EhmF
+BIMrI7jxJRrA3Oe2Z6Z5rp0CReZ0yfDgOTsiGnQIDQqE4RjW3kSipW0lWbK7z0/d
+Ul/NVh2OzNhv8avm4pnGZpVv0mHrFmszKQxYpv5sxcvhEOscNDius4/lFbeCALro
+NURR3wu3ap5g2bRoEfn/ZSlvB+FaN2dis1lHY6wqCSOySC1sEJIFGEKketpEhk8D
+dfJuXhoUrBaARu94qtFdOHubLX51e4eor45NXpdFZEN8RSLlVa72mxK199D722Ng
+Jz8LzMSI0t8kevScY714sNH7Vj4oDtPhMO2Rb0IWu9vKTf98P0OCUjFUo36lmbmu
+Njx5VPYneVhsUM1mftG9a4+42qBaie2ApY6mYFYKGnQRPGe0BZQ6KZL+DzQxch67
+othfc0C0y3E6ZSUWm+o1RO2Wp043kMQuvaJKK28CWy0Rodu3vx8=
+=IZJ6
+-----END PGP SIGNATURE-----
+
+--=-YIIC7W8/nGsEbIikuEZg--
+
+
+--===============0595421157618928757==
+Content-Type: text/plain; charset="us-ascii"
+MIME-Version: 1.0
+Content-Transfer-Encoding: 7bit
+Content-Disposition: inline
+
+_______________________________________________
+obnam-support mailing list
+obnam-support@obnam.org
+http://listmaster.pepperfish.net/cgi-bin/mailman/listinfo/obnam-support-obnam.org
+
+--===============0595421157618928757==--
+