summaryrefslogtreecommitdiff
path: root/tickets/11c32688f6ae4c039e5fef65b5007a88/Maildir/new/1500920195.M176537P30379Q1.koom
blob: 1ae0068a90e04a16e9c91a107b878b9433afb34f (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
Return-Path: <obnam-support-bounces@obnam.org>
X-Original-To: distix@pieni.net
Delivered-To: distix@pieni.net
Received: from yaffle.pepperfish.net (yaffle.pepperfish.net [88.99.213.221])
	by pieni.net (Postfix) with ESMTPS id E1379417FB
	for <distix@pieni.net>; Mon, 24 Jul 2017 18:15:59 +0000 (UTC)
Received: from platypus.pepperfish.net (unknown [10.112.101.20])
	by yaffle.pepperfish.net (Postfix) with ESMTP id 83373418A5;
	Mon, 24 Jul 2017 19:15:59 +0100 (BST)
Received: from ip6-localhost.nat ([::1] helo=platypus.pepperfish.net)
	by platypus.pepperfish.net with esmtp (Exim 4.80 #2 (Debian))
	id 1dZhtf-00054g-FS; Mon, 24 Jul 2017 19:15:59 +0100
Received: from [10.112.101.21] (helo=mx3.pepperfish.net)
 by platypus.pepperfish.net with esmtps (Exim 4.80 #2 (Debian))
 id 1dZhtf-00054V-2J
 for <obnam-support@obnam.org>; Mon, 24 Jul 2017 19:15:59 +0100
Received: from barracuda.pco-inc.com ([71.4.36.131])
 by mx3.pepperfish.net with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.89) (envelope-from <lperkins@openeye.net>)
 id 1dZhtd-0005OM-4C
 for obnam-support@obnam.org; Mon, 24 Jul 2017 19:15:59 +0100
X-ASG-Debug-ID: 1500920148-0573a21092342010001-phrF5L
Received: from Loki.pcopen.net ([10.0.0.65]) by barracuda.pco-inc.com with
 ESMTP id bZPUn48p0LOZ2zPl (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384
 bits=256 verify=NO); Mon, 24 Jul 2017 11:15:48 -0700 (PDT)
X-Barracuda-Envelope-From: lperkins@openeye.net
Received: from LOKI.pcopen.net ([fe80::39f5:aaff:14af:6002]) by
 Loki.pcopen.net ([fe80::39f5:aaff:14af:6002%10]) with mapi id 14.03.0351.000; 
 Mon, 24 Jul 2017 11:15:49 -0700
From: "Laurence Perkins (OE)" <lperkins@openeye.net>
To: "liw@liw.fi" <liw@liw.fi>
Thread-Topic: Variable Chunksize
X-ASG-Orig-Subj: Re: Variable Chunksize
Thread-Index: AQHTALO1js8PRLWMaEGpkNs8UOGlYqJb6R8AgAGEm4CAAvDXAIADVVeAgAACZYCAAA9cAA==
Date: Mon, 24 Jul 2017 18:15:48 +0000
Message-ID: <1500920142.13826.15.camel@openeye.net>
References: <1500484994.13826.5.camel@openeye.net>
 <20170719181232.sdqihqdqldsgzmtd@liw.fi>
 <1500571405.13826.8.camel@openeye.net>
 <20170722141756.yzxatuvogrdsh4jv@liw.fi>
 <1500916329.13826.13.camel@openeye.net>
 <20170724172043.s2ykrfwcusyzdcgd@liw.fi>
In-Reply-To: <20170724172043.s2ykrfwcusyzdcgd@liw.fi>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: yes
X-MS-TNEF-Correlator: 
x-originating-ip: [10.0.50.60]
MIME-Version: 1.0
X-Barracuda-Connect: UNKNOWN[10.0.0.65]
X-Barracuda-Start-Time: 1500920148
X-Barracuda-Encrypted: ECDHE-RSA-AES256-SHA384
X-Barracuda-URL: https://10.0.0.6:443/cgi-mod/mark.cgi
X-Barracuda-Scan-Msg-Size: 2537
X-Virus-Scanned: by bsmtpd at pco-inc.com
X-Barracuda-BRTS-Status: 1
X-Barracuda-Spam-Score: 0.00
X-Barracuda-Spam-Status: No, SCORE=0.00 using global scores of TAG_LEVEL=1000.0
 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=5.0 tests=
X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.3.41254
 Rule breakdown below
 pts rule name              description
 ---- ---------------------- --------------------------------------------------
X-Pepperfish-Transaction: 05d4-030d-7dc3-9bf6
X-Spam-Score: -1.9
X-Spam-Score-int: -18
X-Spam-Bar: -
X-Scanned-By: pepperfish.net, Mon, 24 Jul 2017 19:15:59 +0100
X-Spam-Report: Content analysis details: (-1.9 points)
 pts rule name              description
 ---- ---------------------- --------------------------------------------------
 -1.9 BAYES_00               BODY: Bayes spam probability is 0 to 1%
 [score: 0.0000]
X-ACL-Warn: message may be spam
X-Scan-Signature: 7caf9b1c1dd65fb728edbedec5ffc17f
Cc: "obnam-support@obnam.org" <obnam-support@obnam.org>
Subject: Re: Variable Chunksize
X-BeenThere: obnam-support@obnam.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Obnam backup software discussion <obnam-support-obnam.org>
List-Unsubscribe: <http://listmaster.pepperfish.net/cgi-bin/mailman/listinfo/obnam-support-obnam.org>,
 <mailto:obnam-support-request@obnam.org?subject=unsubscribe>
List-Archive: <http://listmaster.pepperfish.net/pipermail/obnam-support-obnam.org>
List-Post: <mailto:obnam-support@obnam.org>
List-Help: <mailto:obnam-support-request@obnam.org?subject=help>
List-Subscribe: <http://listmaster.pepperfish.net/cgi-bin/mailman/listinfo/obnam-support-obnam.org>,
 <mailto:obnam-support-request@obnam.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============4633542379024154335=="
Mime-version: 1.0
Sender: obnam-support-bounces@obnam.org
Errors-To: obnam-support-bounces@obnam.org

--===============4633542379024154335==
Content-Language: en-US
Content-Type: multipart/signed; micalg=pgp-sha512;
 protocol="application/pgp-signature"; boundary="=-o1kvqKOW1q8lnQliKXtT"

--=-o1kvqKOW1q8lnQliKXtT
Content-Type: text/plain; charset="UTF-7"
Content-Transfer-Encoding: quoted-printable



On Mon, 2017-07-24 at 20:20 +-0300, Lars Wirzenius wrote:
+AD4 On Mon, Jul 24, 2017 at 05:12:15PM +-0000, Laurence Perkins (OE)
+AD4 wrote:
+AD4 +AD4 Smaller chunk size makes deduplication more precise regardless of
+AD4 +AD4 the
+AD4 +AD4 type of splitting, but it should generate some pretty big savings
+AD4 +AD4 on
+AD4 +AD4 similar data without reducing the chunk size because it will be
+AD4 +AD4 better
+AD4 +AD4 at finding identical chunks of data since it's not relying on the=
m
+AD4 +AD4 being at fixed offsets.
+AD4=20
+AD4 If you have actual measurements of this, please report them. Some
+AD4 years ago when this idea first came up in the Obnam context, using
+AD4 the proposed type of chunking without reducing chunk size
+AD4 significatnly didn't much help in de-duplication. Only when the
+AD4 average chunk size became much smaller, did de-deuplication get a lot
+AD4 better, but then the number of chunks became a problem.
+AD4=20
+AD4 Guessing isn't helpful here. Even if it were, now is not a good time
+AD4 for me to spend any time on this, and I don't want to even consider a
+AD4 patch for this until green albatross is in shape.
+AD4=20

Mathematically it's going to depend a lot on your dataset. You will
never see any benefit on files smaller than two chunks, nor on files
where new data is simply appended to the end.  This is probably the
kind of data most users have at this point, (along with data that is
never modified) so it's definitely not worth diverting your attention
away from completing green albatross. =20

However, when backing up sparse disk images, cloned VMs,  chroot
tarballs, certain kinds of database file, or anything that spans
multiple chunks and is routinely subjected to random insertions I
commonly see fixed chunked algorithm tools (including Obnam) go quickly
until they hit the first inserted bit of data, and then proceed to re-
transfer the entire rest of the file because all the chunks are off.=20
For one of my machines, this often results in hundreds of gigabytes of
unnecessary data transfer per backup run.  This is probably not
representative of the typical Obnam user, but there have been a few
people asking for advice about how to optimize such things on the
mailing list, so I doubt I'm the only one.

I strongly suspect that all the extra hashing is going to crater
performance regardless, but I'll give it a shot.  Since the repo
formats are chunksize-agnostic impact on other sections of the program
should be virtually nil.
--=-o1kvqKOW1q8lnQliKXtT
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: This is a digitally signed message part
Content-Transfer-Encoding: 7bit

-----BEGIN PGP SIGNATURE-----

iQIzBAABCgAdFiEEFbYe3ereZkZxAoz7C4CSuysVUSAFAll2OU4ACgkQC4CSuysV
USDS5Q/5AZrLSzkyGO2pgYsYul+uxC4GfAOWWOSQjBcMjyRPWG+NY8lytJtHPm+F
nTsKiCPVSh9NMhW3C/Mpho+LcYBLF4Vy8sMhnQ5MPI41sBSOXnKr4+b1Ed5hLLeD
LgfuMZECoDvy4AhESaafp4AC1YmfGJuCAnG0R65IpcjkncKo2u/wiG/4x3HWTFI9
6CpwK0Z2U8TzntMayLsWkZ5CQC6DnzScwCXAupqgZIoERJLMXiZGl4sjXNtGAva9
NtcDFiJXtOfIutyyAjmfX28e3LvN1EaRo706UBXSNQns0ZWkhqj2geRr/g+1JF7u
HQpstObdryM94Y0QpRf70dNEDh+nlRm/vm+iEZwP/zgTBSK489ieUFghn+UhLSuC
1Y2h69gmLUXq2p0wxrG/Sdo/6t1UbhOLq3PnfVbEdzfzZA4/00vKGkFi29mJZkGY
NOcIaozz41lkI9SHl/FrCmlS92muEYoyYLyvjzsWD0Ee/bFj5GXnQ0i6JTTDSHZ+
i5EiyvAHc5mjgucqqIBt6LfdGVj9/o9t59qvwUkhvVCN8PbInkYNRy1rmIYUJaEu
xBPL7/BVWl6ec056rHACUuCKwwI9GwrJZJwx/hjXtILLwjL+JB5bd+wzaBZ19Wyy
l9Q1worhQ/hd+aPuh2vVznyO7ZMDaEFp0YaVwOhLhkgVb/bZfaU=
=GD/e
-----END PGP SIGNATURE-----

--=-o1kvqKOW1q8lnQliKXtT--


--===============4633542379024154335==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
obnam-support mailing list
obnam-support@obnam.org
http://listmaster.pepperfish.net/cgi-bin/mailman/listinfo/obnam-support-obnam.org

--===============4633542379024154335==--