summaryrefslogtreecommitdiff
path: root/tickets/11c32688f6ae4c039e5fef65b5007a88/Maildir/new/1500916511.M285038P15945Q1.koom
blob: dc39c73798425322b3a0fc867298d36a7809aaa9 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
Return-Path: <obnam-support-bounces@obnam.org>
X-Original-To: distix@pieni.net
Delivered-To: distix@pieni.net
Received: from yaffle.pepperfish.net (yaffle.pepperfish.net [88.99.213.221])
	by pieni.net (Postfix) with ESMTPS id C4EC54020D
	for <distix@pieni.net>; Mon, 24 Jul 2017 17:12:30 +0000 (UTC)
Received: from platypus.pepperfish.net (unknown [10.112.101.20])
	by yaffle.pepperfish.net (Postfix) with ESMTP id 5DEFE418A5;
	Mon, 24 Jul 2017 18:12:30 +0100 (BST)
Received: from ip6-localhost.nat ([::1] helo=platypus.pepperfish.net)
	by platypus.pepperfish.net with esmtp (Exim 4.80 #2 (Debian))
	id 1dZguE-0006Iq-BZ; Mon, 24 Jul 2017 18:12:30 +0100
Received: from [10.112.101.21] (helo=mx3.pepperfish.net)
 by platypus.pepperfish.net with esmtps (Exim 4.80 #2 (Debian))
 id 1dZguC-0006Ia-Kb
 for <obnam-support@obnam.org>; Mon, 24 Jul 2017 18:12:28 +0100
Received: from barracuda.pco-inc.com ([71.4.36.131])
 by mx3.pepperfish.net with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.89) (envelope-from <lperkins@openeye.net>)
 id 1dZguA-0004Dz-DY
 for obnam-support@obnam.org; Mon, 24 Jul 2017 18:12:28 +0100
X-ASG-Debug-ID: 1500916335-0573a2109233ef90001-phrF5L
Received: from Loki.pcopen.net ([10.0.0.65]) by barracuda.pco-inc.com with
 ESMTP id A6ZiJJZYZunuo1DJ (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384
 bits=256 verify=NO); Mon, 24 Jul 2017 10:12:15 -0700 (PDT)
X-Barracuda-Envelope-From: lperkins@openeye.net
Received: from LOKI.pcopen.net ([fe80::39f5:aaff:14af:6002]) by
 Loki.pcopen.net ([fe80::39f5:aaff:14af:6002%10]) with mapi id 14.03.0351.000; 
 Mon, 24 Jul 2017 10:12:16 -0700
From: "Laurence Perkins (OE)" <lperkins@openeye.net>
To: "liw@liw.fi" <liw@liw.fi>
Thread-Topic: Variable Chunksize
X-ASG-Orig-Subj: Re: Variable Chunksize
Thread-Index: AQHTALO1js8PRLWMaEGpkNs8UOGlYqJb6R8AgAGEm4CAAvDXAIADVVeA
Date: Mon, 24 Jul 2017 17:12:15 +0000
Message-ID: <1500916329.13826.13.camel@openeye.net>
References: <1500484994.13826.5.camel@openeye.net>
 <20170719181232.sdqihqdqldsgzmtd@liw.fi>
 <1500571405.13826.8.camel@openeye.net>
 <20170722141756.yzxatuvogrdsh4jv@liw.fi>
In-Reply-To: <20170722141756.yzxatuvogrdsh4jv@liw.fi>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: yes
X-MS-TNEF-Correlator: 
x-originating-ip: [10.0.50.60]
MIME-Version: 1.0
X-Barracuda-Connect: UNKNOWN[10.0.0.65]
X-Barracuda-Start-Time: 1500916335
X-Barracuda-Encrypted: ECDHE-RSA-AES256-SHA384
X-Barracuda-URL: https://10.0.0.6:443/cgi-mod/mark.cgi
X-Barracuda-Scan-Msg-Size: 1553
X-Virus-Scanned: by bsmtpd at pco-inc.com
X-Barracuda-BRTS-Status: 1
X-Barracuda-Spam-Score: 0.00
X-Barracuda-Spam-Status: No, SCORE=0.00 using global scores of TAG_LEVEL=1000.0
 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=5.0 tests=
X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.3.41251
 Rule breakdown below
 pts rule name              description
 ---- ---------------------- --------------------------------------------------
X-Pepperfish-Transaction: 7114-4a85-c704-6016
X-Spam-Score: -1.9
X-Spam-Score-int: -18
X-Spam-Bar: -
X-Scanned-By: pepperfish.net, Mon, 24 Jul 2017 18:12:28 +0100
X-Spam-Report: Content analysis details: (-1.9 points)
 pts rule name              description
 ---- ---------------------- --------------------------------------------------
 -1.9 BAYES_00               BODY: Bayes spam probability is 0 to 1%
 [score: 0.0000]
X-ACL-Warn: message may be spam
X-Scan-Signature: efe41bf02ea65bb76851af65aec5a2d2
Cc: "obnam-support@obnam.org" <obnam-support@obnam.org>
Subject: Re: Variable Chunksize
X-BeenThere: obnam-support@obnam.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Obnam backup software discussion <obnam-support-obnam.org>
List-Unsubscribe: <http://listmaster.pepperfish.net/cgi-bin/mailman/listinfo/obnam-support-obnam.org>,
 <mailto:obnam-support-request@obnam.org?subject=unsubscribe>
List-Archive: <http://listmaster.pepperfish.net/pipermail/obnam-support-obnam.org>
List-Post: <mailto:obnam-support@obnam.org>
List-Help: <mailto:obnam-support-request@obnam.org?subject=help>
List-Subscribe: <http://listmaster.pepperfish.net/cgi-bin/mailman/listinfo/obnam-support-obnam.org>,
 <mailto:obnam-support-request@obnam.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============0226912654442020829=="
Mime-version: 1.0
Sender: obnam-support-bounces@obnam.org
Errors-To: obnam-support-bounces@obnam.org

--===============0226912654442020829==
Content-Language: en-US
Content-Type: multipart/signed; micalg=pgp-sha512;
 protocol="application/pgp-signature"; boundary="=-mkTgCnw/w4FzwjwkBM2+"

--=-mkTgCnw/w4FzwjwkBM2+
Content-Type: text/plain; charset="UTF-7"
Content-Transfer-Encoding: quoted-printable



On Sat, 2017-07-22 at 17:17 +-0300, Lars Wirzenius wrote:
+AD4 For more precise deduplication, it's almost certainly necessary to
+AD4 use
+AD4 smaller chunks than the megabyte sized ones Obnam currently uses by
+AD4 default. Smaller chunks mean more chunks.
+AD4=20
+AD4=20
Smaller chunk size makes deduplication more precise regardless of the
type of splitting, but it should generate some pretty big savings on
similar data without reducing the chunk size because it will be better
at finding identical chunks of data since it's not relying on them
being at fixed offsets.

To illustrate, consider the following two datasets:

aaaaabbbbb

caaaaabbbbb

With a fixed chunk size of 5, the first dataset makes two chunks and
the second makes three with no chunks shared.

With a variable chunk algorithm that splits when it sees +ACI-ab+ACI, they
both make two chunks, and they share one.

As with all deduplication, the chunk size must be tuned to the type of
data you're working with, but even with the current 1MB chunk size,
files larger than that will benefit any time they get a value inserted
into the middle.  Which is a common thing for files like sparse VM
images and the like.

Which just made me realise what kind of space savings I could get on
one of my archives...  I guess this just moved up the priority list a
ways...  Any suggestions for a hash that's fast enough to crawl over
large files without hurting performance too much while still giving
reasonable control over the average chunk size?

LMP
--=-mkTgCnw/w4FzwjwkBM2+
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: This is a digitally signed message part
Content-Transfer-Encoding: 7bit

-----BEGIN PGP SIGNATURE-----

iQIzBAABCgAdFiEEFbYe3ereZkZxAoz7C4CSuysVUSAFAll2KmkACgkQC4CSuysV
USCyJxAAoHCoSWxXpswfJCx2bNm4YoERGJEPYE+a1G3zbj1xnolpDZOzji6vPesB
HqIMYDzohU2fH1dqPZLDzQ8kRLrnHckrevEgdPHCRIX8ZSAS2O+tvXi9gq2CkDZ6
LrR8pbIJZ+1zWeoh6uFWmA3vN/502gyAy2aacXhJyyT7kl1u+wGpkgsyHKRSP0XJ
PU4Ecxp5nWiYsCEq7R2prowLTBGO0Of+uSIw4xlT8M4iAwq/CiBp1dLK0Tg6P1aE
97+GOxcv73fMSTERRxErfl+/Ti3KfORvp8kjYKE29TjFtsBFEw5tfedLfLk81Ae1
0aorAJM+Reg/6T8D7CYhMwcFmDG08vRQwy1OlKVaJCAn/CyOkUyaCMpEFZXZOewy
a380AvAPY6h9xYebl3Ot5MwQCLUHYVVVhKRrkOgr0cULrIuGZobUVFPHREUDwbvR
fTDFEQuoLv11NJhyA/CwGbUKLtjtiByG9EKNEUo24vj+sTNcFL87pdpQxREwgCul
Fehbaq2MRs7M6iMwDFBLPMlswMu/m66bnTQ+4e2zABiXtmlwGkWejJxY9EUgLG/Z
FOwY5qXuDF315rwHkmHUXtdVXI1zn9Ke+Rf+rB2XxpGUzv8JRHbBQbGGjVCObFxV
il40F4A6VPbiO2E7lhZ5RzytL1Smchois+43fvkahCJhiH07HsE=
=ZJgW
-----END PGP SIGNATURE-----

--=-mkTgCnw/w4FzwjwkBM2+--


--===============0226912654442020829==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
obnam-support mailing list
obnam-support@obnam.org
http://listmaster.pepperfish.net/cgi-bin/mailman/listinfo/obnam-support-obnam.org

--===============0226912654442020829==--