summaryrefslogtreecommitdiff
path: root/tickets/a5f48a2fdc2b4d8a81ecf85923430f48/Maildir/new/1456001187.M782655P8737Q40.hrun
blob: 6abe669d615285a626c248887f7af0311abd6648 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
Return-Path: <obnam-dev-bounces@obnam.org>
X-Original-To: distix@pieni.net
Delivered-To: distix@pieni.net
Received: from bagpuss.pepperfish.net (bagpuss.pepperfish.net [148.251.8.16])
	(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by pieni.net (Postfix) with ESMTPS id B3A5F2139A
	for <distix@pieni.net>; Sun, 13 Dec 2015 22:33:53 +0100 (CET)
Received: from platypus.pepperfish.net (unknown [10.112.100.20])
	by bagpuss.pepperfish.net (Postfix) with ESMTP id 55F552DB;
	Sun, 13 Dec 2015 21:33:53 +0000 (GMT)
Received: from ip6-localhost ([::1] helo=platypus.pepperfish.net)
	by platypus.pepperfish.net with esmtp (Exim 4.80 #2 (Debian))
	id 1a8EHB-0006Dz-7q; Sun, 13 Dec 2015 21:33:53 +0000
Received: from inmail0 ([10.112.100.10] helo=mx0.pepperfish.net)
 by platypus.pepperfish.net with esmtp (Exim 4.80 #2 (Debian))
 id 1a8EH9-0006Dt-OB
 for <obnam-dev@obnam.org>; Sun, 13 Dec 2015 21:33:51 +0000
Received: from pieni.net ([95.142.166.37] ident=postfix)
 by mx0.pepperfish.net with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256)
 (Exim 4.80) (envelope-from <liw@liw.fi>) id 1a8EH7-0004tw-DW
 for obnam-dev@obnam.org; Sun, 13 Dec 2015 21:33:51 +0000
Received: from exolobe1.liw.fi (unknown [82.129.76.156])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by pieni.net (Postfix) with ESMTPSA id 8C4352139A
 for <obnam-dev@obnam.org>; Sun, 13 Dec 2015 22:33:42 +0100 (CET)
Received: from exolobe1.liw.fi (localhost [127.0.0.1])
 by exolobe1.liw.fi (Postfix) with ESMTPS id 1584A4037D
 for <obnam-dev@obnam.org>; Sun, 13 Dec 2015 22:33:42 +0100 (CET)
Date: Sun, 13 Dec 2015 22:33:41 +0100
From: Lars Wirzenius <liw@liw.fi>
To: Obnam development <obnam-dev@obnam.org>
Message-ID: <20151213213341.GT2459@exolobe1.liw.fi>
MIME-Version: 1.0
User-Agent: Mutt/1.5.23 (2014-03-12)
X-Spam-Score: -3.4
X-Spam-Score-int: -33
X-Spam-Bar: ---
X-Scanned-By: pepperfish.net, Sun, 13 Dec 2015 21:33:51 +0000
X-Spam-Report: Content analysis details: (-3.4 points)
 pts rule name              description
 ---- ---------------------- --------------------------------------------------
 -1.0 PPF_USER_AGENT_MUTT    User-Agent: contains Mutt (Mutt isn't a spam
 tool) -0.5 PPF_USER_AGENT         User-Agent: exists
 -1.9 BAYES_00               BODY: Bayes spam probability is 0 to 1%
 [score: 0.0000]
X-ACL-Warn: message may be spam
X-Scan-Signature: 2ed25ed186b4a4f6da12919af8e199bc
Subject: Obnam development status update
X-BeenThere: obnam-dev@obnam.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Obnam development discussions <obnam-dev-obnam.org>
List-Unsubscribe: <http://listmaster.pepperfish.net/cgi-bin/mailman/listinfo/obnam-dev-obnam.org>,
 <mailto:obnam-dev-request@obnam.org?subject=unsubscribe>
List-Archive: <http://listmaster.pepperfish.net/pipermail/obnam-dev-obnam.org>
List-Post: <mailto:obnam-dev@obnam.org>
List-Help: <mailto:obnam-dev-request@obnam.org?subject=help>
List-Subscribe: <http://listmaster.pepperfish.net/cgi-bin/mailman/listinfo/obnam-dev-obnam.org>,
 <mailto:obnam-dev-request@obnam.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============7182395716470202758=="
Mime-version: 1.0
Sender: obnam-dev-bounces@obnam.org
Errors-To: obnam-dev-bounces@obnam.org


--===============7182395716470202758==
Content-Type: multipart/signed; micalg=pgp-sha256;
 protocol="application/pgp-signature"; boundary="Nx8xdmI2KD3LNVVP"
Content-Disposition: inline


--Nx8xdmI2KD3LNVVP
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Hello, people interested in Obnam development.

A brief status update on Obnam development. There's a request for help
at the end.

For a long time now, I've mainly concentrated my Obnam development
time on FORMAT GREEN ALBATROSS[0]. with some bug fixing for other
parts of the code base. The green-albatross code is the new repository
format, which is intended to make Obnam much faster than currently,
using the "FORMAT 6" repositry format. For precise details, see the
the NEWS file and the git commit log.

[0] Yes, that's a Charles Stross Laundry Files reference.

Format 6 is based on copy-on-write B-trees using the Larch library.
Green-albatross is based on DIR objects that contain all the metadata
of a directory and its contents, but not its sub-directories.

Would it be useful and welcome, if I wrote about my Obnam development
thoughts more often on obnam-dev, or would that be flooding the list
with too much traffic? I guess I could think out loud in my blog
instead.

In September I got a grant from the FUUG Foundation, see [1], to buy a
computer for Obnam development. I bought a pile of parts, and
assembled it. It's the first computer I've assembled since 2009. I'll
be writing about the details of this in my blog, later on. Until this
computer, almost all Obnam development has been done on my personal
laptop, which has been somewhat limiting. For example, running
benchmarks meant that I couldn't usefully do much more at the same
time, so I tended to run benchmarks overnight, and that limited the
feedback, making progress slower. Also, I do not have terabytes of
free disk space on laptop.

[1] http://blog.liw.fi/posts/fuug-grant/

Over northern hemisphere summer this year, I used some virtual
machines in the Bytemark BigV cloud, sponsored by Bytemark, which also
worked fine for bencharmking, but a dedicated machine works better.

The new computer has sped up green albatross development quite nicely.
Results can be viewed at [2]. I've been running two simple benchmarks:
one with a million files, each 1 random byte, and one with a single
file of 10 GiB. The numbers have gone from about 2200 and 1400
seconds, respectively, to about 1500 and 400 seconds, respectively.
The corresponding numbers for format 6 are about 9300 and 3600
seconds.

[2] http://benchmark.obnam.org/e2obbench-v2/html/

I'd like green-albatross to get a lot faster still, but this is now
fast enough that I've started running a secondary set of backups using
the green albatross format.

There's still a lot to do, before the new repository format is ready
for production use. I know of the following:

* The chunk index data structures, necessary for de-duplication and
  for removing chunks that are no longer needed, are currently very
  simplistic. They are one encoded as one big blob, and this needs to
  be split up into smaller pieces of data that can be re-used across
  updates, in a copy-on-write manner. Otherwise, each backup will
  upload tens or hundreds of megabytes of mostly duplicate data.

* Obnam currently doesn't do copy-on-write updates of the
  green-albatross DIR objects. This means that a new DIR object is
  created for each directory in live data, even if there's been no
  changes from the previous backup. This causes further unnecessary
  data to be uploaded for every backup.

* The "obnam forget" operation for green-albatross doesn't actually
  remove now-unnecessary chunks. Oops.

* The FUSE filesystem doesn't do seeks very efficiently, as it needs
  to read from the beginning of a file every time. This can be quite
  slow. The fix is fairly easy: store the size of a chunk with the
  reference to the chunk, and seeking can suddenly be done much
  faster.

* I'd like to add some more filesystem metadata about files that
  format 6 doesn't support.

* It would be nice for people be able to convert existing backups to a
  new repository format. I'm sure green-albatross is not the last
  repository format for Obnam, either. I'm thinking that something
  conceptually similar to git's "fast-export" format would be good. In
  other words, an interchange format to represent a backup repository
  in a way that's independent of the repository format, and possibly
  even independent of the backup program, in which case it could be
  used to convert from, say, Obnam to Attic, Attic to bup, or
  duplicity to rdiff-backup.

* There is no "obnam fsck" for green-albatross yet. Someone should
  write that.

* A bunch more benchmarking is needed: different scenarios, and more
  types of measurements. For example, I don't currently know how much
  memory Obnam is using with green-albatross. Hopefully there's no
  unpleasant surprises lurking there.

So that's what's missing, as far as I currently know. There's a lot of
details and I'm probably not aware of everything that's still needed.

Which brings me to a request for help: If you have the spare capacity
(CPU, memory, disk space, bandwidth, time), I'd really quite
appreciate if you could try the green-albatross repository format.
However, and I can't stress this enough, it is not stable, it is NOT
READY for production use, and the on-disk repository format WILL
CHANGE in ways that will force you to wipe a repository and back up
everything all over again. Also, see the bulleted list above for
things that aren't done yet. If you don't mind that, then create a new
backup using "--repository-format=3Dgreen-albatross". Everything else
should work as before, except the things that don't work at all yet.
Or the things that are buggy.

If you do try, please report to obnam-dev, especially if you see any
problems, but also if you don't.

If you want to do some Obnam hacking, the repository format conversion
and the fsck are the areas where I'd most urgently appreciate help.

If you are not a developer, or can't take on such large projects,
another way to help is to be on the obnam-support mailing list and
answer questions from people. There aren't that many, but even so the
take time from me that I might otherwise use for writing code for
Obnam.

Thank you, and happy backups for everyone.

--=20
Schr=F6dinger's backup hypothesis: the condition of any backup is
undefined until a restore is attempted. -- andrewsh

--Nx8xdmI2KD3LNVVP
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQIcBAEBCAAGBQJWbeQ0AAoJEGwvphbseiAxcj8QAIv4jxqlfVLKVpT80Cc49t/g
5ot4VTj0MBzxwy8D7efApvciEu0U7W5JKTxo4tYuSIjqhjQR4ShvVg2OYqOlaeck
6W8w2vrXCdYEDySCSIA64RC7g9kjXewjWq/2+EUWzkMmB72mg7p2gNBvQBWem0Ea
6v+M1W7R28hUV2oCFgEKgR5u/BQ24aWWvtMSarFK9qsxfA5HAl/Cj2W3lUNCH8J9
yIrHv9sVzHAI2X/mpIaFjH+4QK5lkCXrBCflRkX1tG1GRpfHShNrfNEb9ASqTNMJ
LMtE1A8/LJnG5ZY6NfN1dT2u97KptCcRTnQJj+m/D9AAQPFy1cgzJE1kG0n2mIdF
Ko8vLlOiC1fEt37KcBRjPUmkEysrDtsshpAN+EfOVJH/8jV8UX3SPpBEgWA6aodk
07ZHsnOjEdZVDCt0CDFbTNbDvoXn3b77pYkDrRBz8QzZYJudcA7ainBIRV94tXfg
FfTnna6A1ieoB/F8re2asXwpiyzUxj4bB170ZHhDWr8U1LXRDKjygv0NCVmLStBI
EH/Iqd3SdrQqCO+QdLPPxNni5vlqBvwAp8QJG0Aql0k5td07XrGJBFH71fOGCvVP
/EgM9EYL1BIj/8YQ3Bh6KIh3g+qZZxopI3g+D3wj0TcE19C+Jxz2Jpi6aT2Qxhkn
jXdngj+wVXCSDLYiUwi5
=JgOm
-----END PGP SIGNATURE-----

--Nx8xdmI2KD3LNVVP--


--===============7182395716470202758==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
obnam-dev mailing list
obnam-dev@obnam.org
http://listmaster.pepperfish.net/cgi-bin/mailman/listinfo/obnam-dev-obnam.org

--===============7182395716470202758==--