diff options
author | distix ticketing system <distix@pieni.net> | 2017-07-19 17:25:35 +0000 |
---|---|---|
committer | distix ticketing system <distix@pieni.net> | 2017-07-19 17:25:35 +0000 |
commit | 1f1e64259f7217d4ebf29daf42c9db3ae5b21cb1 (patch) | |
tree | 81613daeae74c5bcd971af1a9a29b53486b049d8 | |
parent | eaa804fc2bc75fbad4952a7faba69b499619d060 (diff) | |
download | obnam-support-distix-1f1e64259f7217d4ebf29daf42c9db3ae5b21cb1.tar.gz |
imported mails
5 files changed, 140 insertions, 0 deletions
diff --git a/tickets/11c32688f6ae4c039e5fef65b5007a88/Maildir/cur/.this-dir-not-empty/.empty/empty-file b/tickets/11c32688f6ae4c039e5fef65b5007a88/Maildir/cur/.this-dir-not-empty/.empty/empty-file new file mode 100644 index 0000000..e69de29 --- /dev/null +++ b/tickets/11c32688f6ae4c039e5fef65b5007a88/Maildir/cur/.this-dir-not-empty/.empty/empty-file diff --git a/tickets/11c32688f6ae4c039e5fef65b5007a88/Maildir/new/.this-dir-not-empty/.empty/empty-file b/tickets/11c32688f6ae4c039e5fef65b5007a88/Maildir/new/.this-dir-not-empty/.empty/empty-file new file mode 100644 index 0000000..e69de29 --- /dev/null +++ b/tickets/11c32688f6ae4c039e5fef65b5007a88/Maildir/new/.this-dir-not-empty/.empty/empty-file diff --git a/tickets/11c32688f6ae4c039e5fef65b5007a88/Maildir/new/1500485134.M964867P15301Q1.koom b/tickets/11c32688f6ae4c039e5fef65b5007a88/Maildir/new/1500485134.M964867P15301Q1.koom new file mode 100644 index 0000000..eb2f6f6 --- /dev/null +++ b/tickets/11c32688f6ae4c039e5fef65b5007a88/Maildir/new/1500485134.M964867P15301Q1.koom @@ -0,0 +1,134 @@ +Return-Path: <obnam-support-bounces@obnam.org> +X-Original-To: distix@pieni.net +Delivered-To: distix@pieni.net +Received: from yaffle.pepperfish.net (yaffle.pepperfish.net [88.99.213.221]) + by pieni.net (Postfix) with ESMTPS id 73275415D4 + for <distix@pieni.net>; Wed, 19 Jul 2017 17:23:33 +0000 (UTC) +Received: from platypus.pepperfish.net (unknown [10.112.101.20]) + by yaffle.pepperfish.net (Postfix) with ESMTP id D841D417C2; + Wed, 19 Jul 2017 18:23:32 +0100 (BST) +Received: from ip6-localhost.nat ([::1] helo=platypus.pepperfish.net) + by platypus.pepperfish.net with esmtp (Exim 4.80 #2 (Debian)) + id 1dXshA-0006SX-Re; Wed, 19 Jul 2017 18:23:32 +0100 +Received: from [10.112.101.21] (helo=mx3.pepperfish.net) + by platypus.pepperfish.net with esmtps (Exim 4.80 #2 (Debian)) + id 1dXsh9-0006SI-Hw + for <obnam-support@obnam.org>; Wed, 19 Jul 2017 18:23:31 +0100 +Received: from barracuda.pco-inc.com ([71.4.36.131]) + by mx3.pepperfish.net with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) + (Exim 4.89) (envelope-from <lperkins@openeye.net>) + id 1dXsh7-00020c-Hj + for obnam-support@obnam.org; Wed, 19 Jul 2017 18:23:31 +0100 +X-ASG-Debug-ID: 1500484997-0573a21092297810001-phrF5L +Received: from Loki.pcopen.net ([10.0.0.65]) by barracuda.pco-inc.com with + ESMTP id 2sQxAElHl0y85NJX (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 + bits=256 verify=NO) for <obnam-support@obnam.org>; + Wed, 19 Jul 2017 10:23:17 -0700 (PDT) +X-Barracuda-Envelope-From: lperkins@openeye.net +Received: from LOKI.pcopen.net ([fe80::39f5:aaff:14af:6002]) by + Loki.pcopen.net ([fe80::39f5:aaff:14af:6002%10]) with mapi id 14.03.0351.000; + Wed, 19 Jul 2017 10:23:17 -0700 +From: "Laurence Perkins (OE)" <lperkins@openeye.net> +To: "obnam-support@obnam.org" <obnam-support@obnam.org> +Thread-Topic: Variable Chunksize +X-ASG-Orig-Subj: Variable Chunksize +Thread-Index: AQHTALO1js8PRLWMaEGpkNs8UOGlYg== +Date: Wed, 19 Jul 2017 17:23:16 +0000 +Message-ID: <1500484994.13826.5.camel@openeye.net> +Accept-Language: en-US +Content-Language: en-US +X-MS-Has-Attach: +X-MS-TNEF-Correlator: +x-originating-ip: [10.0.50.60] +Content-Type: text/plain; charset="utf-7" +Content-ID: <B00B7502D33022438E10B4F4FB24AB99@pco-inc.com> +Content-Transfer-Encoding: quoted-printable +MIME-Version: 1.0 +X-Barracuda-Connect: UNKNOWN[10.0.0.65] +X-Barracuda-Start-Time: 1500484997 +X-Barracuda-Encrypted: ECDHE-RSA-AES256-SHA384 +X-Barracuda-URL: https://10.0.0.6:443/cgi-mod/mark.cgi +X-Barracuda-Scan-Msg-Size: 2212 +X-Virus-Scanned: by bsmtpd at pco-inc.com +X-Barracuda-BRTS-Status: 1 +X-Barracuda-Spam-Score: 0.00 +X-Barracuda-Spam-Status: No, SCORE=0.00 using global scores of TAG_LEVEL=1000.0 + QUARANTINE_LEVEL=1000.0 KILL_LEVEL=5.0 tests= +X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.3.41101 + Rule breakdown below + pts rule name description + ---- ---------------------- -------------------------------------------------- +X-Pepperfish-Transaction: 8b73-1403-7255-48a4 +X-Spam-Score: -1.9 +X-Spam-Score-int: -18 +X-Spam-Bar: - +X-Scanned-By: pepperfish.net, Wed, 19 Jul 2017 18:23:31 +0100 +X-Spam-Report: Content analysis details: (-1.9 points) + pts rule name description + ---- ---------------------- -------------------------------------------------- + -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% + [score: 0.0000] +X-ACL-Warn: message may be spam +X-Scan-Signature: 2c50e0cfd6ac8342e87940d191065d4f +Subject: Variable Chunksize +X-BeenThere: obnam-support@obnam.org +X-Mailman-Version: 2.1.5 +Precedence: list +List-Id: Obnam backup software discussion <obnam-support-obnam.org> +List-Unsubscribe: <http://listmaster.pepperfish.net/cgi-bin/mailman/listinfo/obnam-support-obnam.org>, + <mailto:obnam-support-request@obnam.org?subject=unsubscribe> +List-Archive: <http://listmaster.pepperfish.net/pipermail/obnam-support-obnam.org> +List-Post: <mailto:obnam-support@obnam.org> +List-Help: <mailto:obnam-support-request@obnam.org?subject=help> +List-Subscribe: <http://listmaster.pepperfish.net/cgi-bin/mailman/listinfo/obnam-support-obnam.org>, + <mailto:obnam-support-request@obnam.org?subject=subscribe> +Sender: obnam-support-bounces@obnam.org +Errors-To: obnam-support-bounces@obnam.org + +Stumbled across a couple of rsync/backup type programs that use +variable-sized chunks to improve backup performance. It might be worth +considering adding as an option. + +The concept is actually relatively simple. Instead of splitting chunks +at fixed sizes, chunks are split when the data matches some heuristic.=20 +The common way to do it is to do a byte-by-byte hash of the stream and +put in a chunk boundary wherever the hash meets some criteria. With a +bit of statistics you can set what the average chunksize will be just +by tweaking the criteria slightly. (So, read say the first 1KB, hash +it, then drop the first byte from the data to hash and add 1KB+-1B to +the end of the data to hash and repeat. You walk over the data a byte +at a time while still keeping a big enough amount of data going into +the hash algorithm to produce interesting output. Split the chunk when +the first X bytes of the hash are zero or whatever's convenient.) + +Other than that, the rest of the chunking and indexing routines work +the same way they do now. + +The advantage is that adding a single byte to the start of a file +doesn't cause the entire file to be re-uploaded. It will just increase +the size of the first chunk by one byte and re-use the rest. (Or, if +it's exactly in the chunk boundary, it will change the first two +chunks. But still, that's a lot less data to re-upload.) It will also +increase the amount of deduplication between nearly-identical files. + +The downside is the overhead involved in walking a hash algorithm +across the data, but it's not like you need a particularly CPU +intensive hash, or even any significant collision resistance, you just +need one that produces enough entropy to get your chunk sizes in the +realm you want. + +This feature would significantly increase Obnam's deduplication +capabilities, without affecting the complexity of the datastore (It +already supports multiple chunk sizes in the repo.) + +I will take a look and see if I can figure out how to add this in, but +I'm not a particularly tidy coder on the best of days, so someone with +a bit more Python design experience than I have might be quicker at it. + Like... A lot quicker... + +LMP= + +_______________________________________________ +obnam-support mailing list +obnam-support@obnam.org +http://listmaster.pepperfish.net/cgi-bin/mailman/listinfo/obnam-support-obnam.org diff --git a/tickets/11c32688f6ae4c039e5fef65b5007a88/Maildir/tmp/.this-dir-not-empty/.empty/empty-file b/tickets/11c32688f6ae4c039e5fef65b5007a88/Maildir/tmp/.this-dir-not-empty/.empty/empty-file new file mode 100644 index 0000000..e69de29 --- /dev/null +++ b/tickets/11c32688f6ae4c039e5fef65b5007a88/Maildir/tmp/.this-dir-not-empty/.empty/empty-file diff --git a/tickets/11c32688f6ae4c039e5fef65b5007a88/ticket.yaml b/tickets/11c32688f6ae4c039e5fef65b5007a88/ticket.yaml new file mode 100644 index 0000000..f19c05e --- /dev/null +++ b/tickets/11c32688f6ae4c039e5fef65b5007a88/ticket.yaml @@ -0,0 +1,6 @@ +status: +- '' +ticket-id: +- 11c32688f6ae4c039e5fef65b5007a88 +title: +- Variable Chunksize |