From 1068df516756be76a9d0fc74a59b7c982ca5407a Mon Sep 17 00:00:00 2001 From: distix ticketing system Date: Thu, 20 Jul 2017 17:26:26 +0000 Subject: imported mails --- .../Maildir/new/1500571585.M471479P20604Q1.koom | 257 +++++++++++++++++++++ 1 file changed, 257 insertions(+) create mode 100644 tickets/11c32688f6ae4c039e5fef65b5007a88/Maildir/new/1500571585.M471479P20604Q1.koom diff --git a/tickets/11c32688f6ae4c039e5fef65b5007a88/Maildir/new/1500571585.M471479P20604Q1.koom b/tickets/11c32688f6ae4c039e5fef65b5007a88/Maildir/new/1500571585.M471479P20604Q1.koom new file mode 100644 index 0000000..9d05d07 --- /dev/null +++ b/tickets/11c32688f6ae4c039e5fef65b5007a88/Maildir/new/1500571585.M471479P20604Q1.koom @@ -0,0 +1,257 @@ +Return-Path: +X-Original-To: distix@pieni.net +Delivered-To: distix@pieni.net +Received: from yaffle.pepperfish.net (yaffle.pepperfish.net [88.99.213.221]) + by pieni.net (Postfix) with ESMTPS id 3232A4378C + for ; Thu, 20 Jul 2017 17:23:53 +0000 (UTC) +Received: from platypus.pepperfish.net (unknown [10.112.101.20]) + by yaffle.pepperfish.net (Postfix) with ESMTP id AFA25417A5; + Thu, 20 Jul 2017 18:23:52 +0100 (BST) +Received: from ip6-localhost.nat ([::1] helo=platypus.pepperfish.net) + by platypus.pepperfish.net with esmtp (Exim 4.80 #2 (Debian)) + id 1dYFB2-0003LW-MF; Thu, 20 Jul 2017 18:23:52 +0100 +Received: from [10.112.101.21] (helo=mx3.pepperfish.net) + by platypus.pepperfish.net with esmtps (Exim 4.80 #2 (Debian)) + id 1dYFB1-0003LF-LY + for ; Thu, 20 Jul 2017 18:23:51 +0100 +Received: from barracuda.pco-inc.com ([71.4.36.131]) + by mx3.pepperfish.net with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) + (Exim 4.89) (envelope-from ) + id 1dYFAz-0003VG-Fi + for obnam-support@obnam.org; Thu, 20 Jul 2017 18:23:51 +0100 +X-ASG-Debug-ID: 1500571418-0573a210922c6380001-phrF5L +Received: from Loki.pcopen.net ([10.0.0.65]) by barracuda.pco-inc.com with + ESMTP id O2txcpxbBEzlTh7P (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 + bits=256 verify=NO); Thu, 20 Jul 2017 10:23:38 -0700 (PDT) +X-Barracuda-Envelope-From: lperkins@openeye.net +Received: from LOKI.pcopen.net ([fe80::39f5:aaff:14af:6002]) by + Loki.pcopen.net ([fe80::39f5:aaff:14af:6002%10]) with mapi id 14.03.0351.000; + Thu, 20 Jul 2017 10:23:37 -0700 +From: "Laurence Perkins (OE)" +To: "liw@liw.fi" +Thread-Topic: Variable Chunksize +X-ASG-Orig-Subj: Re: Variable Chunksize +Thread-Index: AQHTALO1js8PRLWMaEGpkNs8UOGlYqJb6R8AgAGEm4A= +Date: Thu, 20 Jul 2017 17:23:37 +0000 +Message-ID: <1500571405.13826.8.camel@openeye.net> +References: <1500484994.13826.5.camel@openeye.net> + <20170719181232.sdqihqdqldsgzmtd@liw.fi> +In-Reply-To: <20170719181232.sdqihqdqldsgzmtd@liw.fi> +Accept-Language: en-US +Content-Language: en-US +X-MS-Has-Attach: yes +X-MS-TNEF-Correlator: +x-originating-ip: [10.0.50.60] +MIME-Version: 1.0 +X-Barracuda-Connect: UNKNOWN[10.0.0.65] +X-Barracuda-Start-Time: 1500571418 +X-Barracuda-Encrypted: ECDHE-RSA-AES256-SHA384 +X-Barracuda-URL: https://10.0.0.6:443/cgi-mod/mark.cgi +X-Barracuda-Scan-Msg-Size: 4950 +X-Virus-Scanned: by bsmtpd at pco-inc.com +X-Barracuda-BRTS-Status: 1 +X-Barracuda-BRTS-Evidence: aea-obnam.org +X-Barracuda-BRTS-Evidence: port-obnam.org +X-Barracuda-Spam-Score: 0.00 +X-Barracuda-Spam-Status: No, SCORE=0.00 using global scores of TAG_LEVEL=1000.0 + QUARANTINE_LEVEL=1000.0 KILL_LEVEL=5.0 tests= +X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.3.41130 + Rule breakdown below + pts rule name description + ---- ---------------------- -------------------------------------------------- +X-Pepperfish-Transaction: 2871-25ff-cc85-139e +X-Spam-Score: -1.9 +X-Spam-Score-int: -18 +X-Spam-Bar: - +X-Scanned-By: pepperfish.net, Thu, 20 Jul 2017 18:23:51 +0100 +X-Spam-Report: Content analysis details: (-1.9 points) + pts rule name description + ---- ---------------------- -------------------------------------------------- + -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% + [score: 0.0000] +X-ACL-Warn: message may be spam +X-Scan-Signature: ba526989c802b3f197cd2fc106aa2555 +Cc: "obnam-support@obnam.org" +Subject: Re: Variable Chunksize +X-BeenThere: obnam-support@obnam.org +X-Mailman-Version: 2.1.5 +Precedence: list +List-Id: Obnam backup software discussion +List-Unsubscribe: , + +List-Archive: +List-Post: +List-Help: +List-Subscribe: , + +Content-Type: multipart/mixed; boundary="===============0595421157618928757==" +Mime-version: 1.0 +Sender: obnam-support-bounces@obnam.org +Errors-To: obnam-support-bounces@obnam.org + +--===============0595421157618928757== +Content-Language: en-US +Content-Type: multipart/signed; micalg=pgp-sha512; + protocol="application/pgp-signature"; boundary="=-YIIC7W8/nGsEbIikuEZg" + +--=-YIIC7W8/nGsEbIikuEZg +Content-Type: text/plain; charset="UTF-7" +Content-Transfer-Encoding: quoted-printable + +Hmmm... it shouldn't actually increase the number of chunks on +average. The ones that come out a little smaller will get balanced out +by ones that come out a little bigger. I suppose a really unlucky +dataset could end up with a whole pile of tiny chunks, but adding a +minimum and maximum chunk size wouldn't change the algorithm that much, +and would mitigate that problem. + +To illustrate, say we're storing a string of random digits 0-9.=20 +Statistically, if the set is large enough, splitting every ten +characters should yield approximately the same number of chunks as +splitting every time a 1 comes up. The bigger the repo, the more likely +it is to come out the same. + +Since the data going in obviously won't be random, the key is to use a +hash function where the output is (at least statistically). + +When I get some spare time I'll see if I can hack it into the chunking +algorithm and test what happens with some real data sets. I suspect +the real challenge will be finding a hash algorithm with sufficient +entropy that can be computed that many times without a backup taking a +month or having to resort to loading the calculations into the GPU... + + + +On Wed, 2017-07-19 at 21:12 +-0300, Lars Wirzenius wrote: ++AD4 This has, in fact, been discussed before, and it's my intention to ++AD4 look at implementing this once FORMAT GREEN ALBATROSS is finished. ++AD4 The ++AD4 issue is that variable-sized chunking may increase the number of ++AD4 chunks by a lot, and I'm not sure Obnam's data structures will deal ++AD4 with that sufficiently efficiently. ++AD4=20 ++AD4 On Wed, Jul 19, 2017 at 05:23:16PM +-0000, Laurence Perkins (OE) ++AD4 wrote: ++AD4 +AD4 Stumbled across a couple of rsync/backup type programs that use ++AD4 +AD4 variable-sized chunks to improve backup performance.+AKAAoA-It mi= +ght be ++AD4 +AD4 worth ++AD4 +AD4 considering adding as an option. ++AD4 +AD4=20 ++AD4 +AD4 The concept is actually relatively simple.+AKAAoA-Instead of spli= +tting ++AD4 +AD4 chunks ++AD4 +AD4 at fixed sizes, chunks are split when the data matches some ++AD4 +AD4 heuristic.+AKA ++AD4 +AD4 The common way to do it is to do a byte-by-byte hash of the strea= +m ++AD4 +AD4 and ++AD4 +AD4 put in a chunk boundary wherever the hash meets some ++AD4 +AD4 criteria.+AKAAoA-With a ++AD4 +AD4 bit of statistics you can set what the average chunksize will be ++AD4 +AD4 just ++AD4 +AD4 by tweaking the criteria slightly.+AKAAoA(So, read say the first = +1KB, ++AD4 +AD4 hash ++AD4 +AD4 it, then drop the first byte from the data to hash and add 1KB+-1= +B ++AD4 +AD4 to ++AD4 +AD4 the end of the data to hash and repeat.+AKAAoA-You walk over the = +data a ++AD4 +AD4 byte ++AD4 +AD4 at a time while still keeping a big enough amount of data going ++AD4 +AD4 into ++AD4 +AD4 the hash algorithm to produce interesting output.+AKAAoA-Split th= +e chunk ++AD4 +AD4 when ++AD4 +AD4 the first X bytes of the hash are zero or whatever's convenient.) ++AD4 +AD4=20 ++AD4 +AD4 Other than that, the rest of the chunking and indexing routines ++AD4 +AD4 work ++AD4 +AD4 the same way they do now. ++AD4 +AD4=20 ++AD4 +AD4 The advantage is that adding a single byte to the start of a file ++AD4 +AD4 doesn't cause the entire file to be re-uploaded.+AKAAoA-It will j= +ust ++AD4 +AD4 increase ++AD4 +AD4 the size of the first chunk by one byte and re-use the rest. (Or, ++AD4 +AD4 if ++AD4 +AD4 it's exactly in the chunk boundary, it will change the first two ++AD4 +AD4 chunks.+AKAAoA-But still, that's a lot less data to re-upload.)+A= +KAAoA-It will ++AD4 +AD4 also ++AD4 +AD4 increase the amount of deduplication between nearly-identical ++AD4 +AD4 files. ++AD4 +AD4=20 ++AD4 +AD4 The downside is the overhead involved in walking a hash algorithm ++AD4 +AD4 across the data, but it's not like you need a particularly CPU ++AD4 +AD4 intensive hash, or even any significant collision resistance, you ++AD4 +AD4 just ++AD4 +AD4 need one that produces enough entropy to get your chunk sizes in ++AD4 +AD4 the ++AD4 +AD4 realm you want. ++AD4 +AD4=20 ++AD4 +AD4 This feature would significantly increase Obnam's deduplication ++AD4 +AD4 capabilities, without affecting the complexity of the datastore (= +It ++AD4 +AD4 already supports multiple chunk sizes in the repo.) ++AD4 +AD4=20 ++AD4 +AD4 I will take a look and see if I can figure out how to add this in= +, ++AD4 +AD4 but ++AD4 +AD4 I'm not a particularly tidy coder on the best of days, so someone ++AD4 +AD4 with ++AD4 +AD4 a bit more Python design experience than I have might be quicker = +at ++AD4 +AD4 it. ++AD4 +AD4 +AKA-Like...+AKAAoA-A lot quicker... ++AD4 +AD4=20 ++AD4 +AD4 LMP ++AD4 +AD4 +AF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBf= +AF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXwBfAF8AXw ++AD4 +AD4 obnam-support mailing list ++AD4 +AD4 obnam-support+AEA-obnam.org ++AD4 +AD4 http://listmaster.pepperfish.net/cgi-bin/mailman/listinfo/obnam-s= +up ++AD4 +AD4 port-obnam.org ++AD4 +AD4=20 ++AD4=20 ++AD4=20 +--=-YIIC7W8/nGsEbIikuEZg +Content-Type: application/pgp-signature; name="signature.asc" +Content-Description: This is a digitally signed message part +Content-Transfer-Encoding: 7bit + +-----BEGIN PGP SIGNATURE----- + +iQIzBAABCgAdFiEEFbYe3ereZkZxAoz7C4CSuysVUSAFAllw5w4ACgkQC4CSuysV +USDBDQ//YKidqOuyaO2pXyZzGZCTs/kkKoFarI15mefwTe25GNqhFuhtZjCnmRcV +hk8YbbymXhphQ9RbryWpGciTgWl9yomGxMWpnc5nyQKWUPYuR7pTFvT+8rb8/5Wl +m0K6UAKryoHzsPzbpJ7Cq5OSx1ZJdBt/kplRRUCUgbw1H4Aw+87g0jwXGvvoJLVw +rdeuny7dGL+IzyvRM3AXxkoqw5MBi3CJgzVLbjwA5wNVTn9mjQ+Wd1rdHBM1EhmF +BIMrI7jxJRrA3Oe2Z6Z5rp0CReZ0yfDgOTsiGnQIDQqE4RjW3kSipW0lWbK7z0/d +Ul/NVh2OzNhv8avm4pnGZpVv0mHrFmszKQxYpv5sxcvhEOscNDius4/lFbeCALro +NURR3wu3ap5g2bRoEfn/ZSlvB+FaN2dis1lHY6wqCSOySC1sEJIFGEKketpEhk8D +dfJuXhoUrBaARu94qtFdOHubLX51e4eor45NXpdFZEN8RSLlVa72mxK199D722Ng +Jz8LzMSI0t8kevScY714sNH7Vj4oDtPhMO2Rb0IWu9vKTf98P0OCUjFUo36lmbmu +Njx5VPYneVhsUM1mftG9a4+42qBaie2ApY6mYFYKGnQRPGe0BZQ6KZL+DzQxch67 +othfc0C0y3E6ZSUWm+o1RO2Wp043kMQuvaJKK28CWy0Rodu3vx8= +=IZJ6 +-----END PGP SIGNATURE----- + +--=-YIIC7W8/nGsEbIikuEZg-- + + +--===============0595421157618928757== +Content-Type: text/plain; charset="us-ascii" +MIME-Version: 1.0 +Content-Transfer-Encoding: 7bit +Content-Disposition: inline + +_______________________________________________ +obnam-support mailing list +obnam-support@obnam.org +http://listmaster.pepperfish.net/cgi-bin/mailman/listinfo/obnam-support-obnam.org + +--===============0595421157618928757==-- + -- cgit v1.2.1