summaryrefslogtreecommitdiff
path: root/NEWS
blob: 055fa1ae32f021174bccd573c7db19f28966972b (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
NEWS for larch
==============

These are the release notes for larch, a Python implementation of a
copy-on-write B-tree, designed by Ohad Rodeh.

Version 1.20151025
------------------

* Test suite improvement: fsck tests by Antoine Brenner.
* Debian packaging improvements.

Version 1.20131130
------------------

* Serious bug fixed: the "KeyError" crash for reference counts. This
  was false memory use optimisation, which triggered a rare bug in
  related code. Repeatable test case by Rob Kendrick, and helpful
  analysis by Itamar Turing-Trauring.

* Serious bug fixed: another "node missing" bug. This crash was
  caused by a bug that overwrote on-disk reference count groups
  with zeroes. Repeatable test case by Rob Kendrick.

* Fixes to fsck from Antoine Brenner.

Version 1.20130808
------------------

* Bug fix in how Larch handles partly-comitted B-tree journals
  in read-only mode. Previously, this would result in a crash
  if, say, a node had been removed, but the B-tree metadata
  hadn't been committed. Recipe for reproducing bug found by
  Damien Couroussé.

Version 1.20130316
------------------

* Fsck now check for missing nodes, and optionally fixes them by
  deleting references to them.
* Node numbers are now reported in hexadecimal, to make it easier to find
  on disk. Patch by Damien Couroussé.
* The `speed-test` script now works again. The initialiser API for
  `NodeStore` and `NodeStoreMemory` now have a new first argument,
  `allow_writes`, to make them compatible with the `NodeStoreDisk`
  initialiser. Patch by Antoine Brenner.
* Improved error messages for nodes that seem to be missing.

Version 1.20121216
------------------

* Make fsck progress reporting be a bit more fine grained.

Version 1.20121006
------------------

* Critical bug fix: an indentation problem in the Python code was fixed.
  A line was intended wrong, resulting it to not be included in the right
  block, and therefore not having access to the variable created in that
  block.
* Bug fix: The Debian packaging was missing a build dependency on cmdtest.

Version 1.20120527, released 2012-05-27
---------------------------------------

* New version scheme. Thank you, Joey Hess.
* The on-disk data structures and file formats are now declared frozen.
  An automatic test has been added to verify that things do not break.

Version 0.31, released 2012-05-08
---------------------------------

* Optimize journal use to have fewer round-trips. This matters when the
  journal is stored on a high-latency storage, such as an SFTP server.
* Now handles better missing key or node sizes in the metadata.
* All exceptions thrown by larch are now subclasses of `larch.Error`.
* The on-disk format is now frozen.

Version 0.30, released 2012-04-29
---------------------------------

* `NodeStoreDisk` is now explicitly in read-only or read-write mode.
  In read-only mode it does not replay or rollback to the journal, or
  care about any changes made there.

Version 0.29, released 2012-04-15
---------------------------------

* Larch now uses the `larch.FormatProblem´ exception when an on-disk 
  node store is missing a format version, or has the wrong version.
  This helps Obnam print a better error message.

Version 0.28, released 2012-03-25
---------------------------------

* Changes to on-disk storage are now done via a journal data structure
  to protect against corruption from crashing. Previously, if a program
  using Larch crashed during an unfortunate moment, the on-disk state
  would not be consistent: some nodes might have been removed, for
  example, but other nodes or the list of root nodes still referred
  to them. This is now fixed.
  
  The on-disk data format version has not changed: the on-disk format
  is compatible with old versions of larch, as long as there are no
  uncommitted changes from a new version.
  
  The API has changed, however, and it is now necessary for Larch
  users to call the `Forest.commit` method to force changes to be
  stored on disk. Failure to do so will cause changes to be lost,
  and many annoyed bug reports.

Version 0.27, released 2012-02-18
---------------------------------

* Merged in some fsck `WorkItem` changes from Obnam, so that Obnam can
  share the code.

Version 0.26, released 2011-12-18
---------------------------------

* `open_forest` now works even if the requested node size is different
  from the node size used with an existing tree. The requested size is
  just ignored, rather than causing an error. This is useful for, say,
  Obnam, when the user decides to change the node size setting, or
  when Obnam's default node size for a tree grows. This way, things
  work.

Version 0.25, released 2011-10-02
---------------------------------

* `fsck` can now optionally attempt to fix B-trees with missing nodes.
  The index nodes referring to the missing nodes get adjusted to drop
  the reference.

Version 0.24, released 2011-09-17
---------------------------------

* Debian packaging install the fsck-larch manual page.

Version 0.23, released 2011-08-19
---------------------------------

* The default size of the LRU cache used by NodeStoreDisk is not 500
  instead of 100. This provides much better performance with large
  trees: 37% vs 99% cache hit rates with speed-test for 100k keys.
* The `BTree.lookup_range` method now returns a list, not a generator.
  It turned out to be very surprising to return a generator, especially
  with the documentation saying a list was returned. (Thanks, Jo Shields,
  for the bug report.)

Version 0.22, released 2011-08-03
---------------------------------

* The library now declares which on-disk format it supports, so that B-trees
  stored with an incompatible format can be detected.
* `fsck-larch` now has a `--trace` option, and the library has a bit more 
  tracing statements.

Version 0.21, released 2011-08-02
---------------------------------

* Better error messages from `fsck-larch`.
* Bug fix: `fsck-larch` no longer reports as missing nodes that aren't
  in use by all B-trees in a forest.
* `fsck-larch` now has a manual page.
* More `tracing.trace` statements to make it easier to track when nodes
  are created and destroyed.
* B-tree nodes are now stored in a more efficient way on disk (several
  levels of directories, not just one). This is a compatibility breaker:
  old B-trees aren't readable with new larch, and B-trees created with
  the new larch aren't readable by old larch.
* A couple of memory leaks fixed: both were caches that grew effectively
  without bounds.
* Least-Recently-Used caches now log their hit/miss statistics
  explicitly. Previous this was done in the `__del__` method, but those
  did not get called deterministically, so the logging did not always
  happen.

Version 0.20, released 2011-07-20
---------------------------------

* `pydoc larch` should now work better.
* Changes to larch benchmarking scripts (to make them use cliapp).
* `fsck-larch` improvements:
  - now uses cliapp, for better usability
  - now automatically detects a forest's key and node sizes,
    so the user no longer needs to give them manually.
  - some more checks
  - installed as part of the Debian package
* API documentation with Sphinx. As part of that, the API was cleaned up
  a bit with regards to public and private methods (the latter being
  prefixed by underscores now).
* The separate LRU cache implementation is now included in larch, to
  avoid yet another dependency, and to avoid polluting PyPI.
* Various speedups.
* `BTree.count_range` method added, for speed.
* Library version number is now `larch.__version__` instead of
  `larch.version`, to follow Python conventions.

Version 0.19, released 2011-03-21
---------------------------------

* The `NodeStoreDisk` class now uses a separate VFS class for I/O, rather
  than methods in `NodeStoreDisk` itself, and requiring subclassing with
  method overrides. A separate VFS class is clearer and simpler to use.
  As a bonus, the API is now compatible with the Obnam VFS API.
* Forest metadata now includes key and node sizes, and there's a factory
  function that checks that the sizes given to it match the existing ones.
  This should reduce potential errors.
* Renamed from btree to larch, to avoid name clashes with other projects.

Version 0.18, released 2011-02-18
---------------------------------

* Fix memory leak.

Version 0.17, released 2011-02-13
---------------------------------

* Use the python-tracing library to add logging of node creation and
  removal and other operations. The library makes it possible for the
  user to enable and disable logging for specific code modules, and
  defaults to being disabled, so that the logging will not affect
  normal execution speed.
* `codec-speed` now reports MiB/s, instead of seconds, giving an easy
  way to compare encoding and decoding speeds with, say, hard disk
  or network speeds.
* B-tree now again modifies nodes, and does so by first removing
  them from the node store's upload queue. This returns the library
  back to useful speeds.

Version 0.16.2, released 2011-01-07
---------------------------------

* Really fix problems with nodes growing too big while in the upload
  queue. The previous fixes meant well, but didn't really do the trick.
  Now we stop modifying nodes at all while in the upload queue, which
  is good for several reasons. Performance in this release degrades
  a lot, until there is time to optimize the code. However, correctness
  is more important than performance.

Version 0.16.1, released 2011-01-07
---------------------------------

* Bug fix: Remove temporary node used while splitting a leaf node that has
  grown too large.
* Bug fix: Since we do, in fact, modify nodes in-place when they are not
  shared between trees, it was possible for a node to be put into the
  node store upload queue, and later modified. This is not a problem as
  such, but when inserting a value into a leaf node, we modify it in place
  making it too large, and then create one or two new nodes. If the 
  too-large node was at the beginning of the upload queue, creating the
  new node might push it out, resulting in a bug. We fix this by moving
  the too-large node to the end of the queue, ensuring it will not be
  pushed out. A cleaner solution would be to not modify the node if it
  will grow too big, but that change will need to wait for a later date.
* BTree now checks that nodes are not too big when they are put into the
  node store. The node store already did the checking, but only at the
  point where it was ready to write the node to disk, and that meant the
  problem was caught at a time that was hard to debug.
* BTree now sets an explicit maximum size of the values inserted into the
  tree: slightly less than half the size of a node. This is necessary for
  leaf node splitting to work correctly without requiring more than more
  than two nodes to hold everything.

Version 0.16, released 2011-01-07
---------------------------------

* The code to split over-full leaf nodes into two is fixed. Before version 
  0.14 we had a problem where one of the two halves might still be too big. 
  The code in 0.14 fixed that, but introduced a new problem where one of
  the new nodes would be very small. Now they should be just right.
  No, I have not read Goldilocks recently, why do you ask?

Version 0.15, released 2011-01-02
---------------------------------

* This version replaces all use of my custom `bsearch` function with the
  `bisect` module in the Python standard library. This speeds up all
  operations, some more than others. In-memory benchmarks with ´speed-test`
  sped up from about 20% for `remove_range` up to 240% for `lookup`.
  No other changes, but I felt this warranted a release on its own.


Version 0.14, released 2010-12-29
---------------------------------

This version seems to work well enough for Obnam to do backups of real 
systems. It is, however, not as fast as one would wish.

Bug fixes:

* When a tree was made shallower after a remove operation, the code used
  to assume the direct children of the root node would not be shared
  with other trees. This is obviously a wrong assumptions. I must have
  been distracted by XKCD when writing the code.
* A bug in cloning (shadowing) nodes when doing copy-on-write was fixed.
  The code now increment the reference counts of an index node's children
  correctly.
* The cached encoded size of nodes now gets cleared by `remove_index_range`.
* Leaf nodes are now split based on size, not key count. Key counts are OK
  for index nodes, whose values are all of the same size. However, leaf
  node values may vary wildly. Sometimes it happens that after splitting,
  one of the halves is still too large.
* `Forest.remove_tree` now actually removes the unshared nodes of the 
  tree that is removed.
* `BTree.remove_range` was quite fast, but not always correct. The code
  was just tricky enough that I was unable to find the actual fault, so
  I rewrote the method in a simplistic, but slow way. A speed improvement
  for this needs to happen in a future version.
  
Speed improvements:

* When a node is cloned, its previously computed size is now remembered.
  Since the clone is identical to the original node, except for the id,
  the size will be the same.
  
New features and stuff:

* fsck-btree: a rudimentary checker of the B-tree data structures.
  This will undoubtedly be improved in the future, but even the simple
  checking it does now has already helped when debugging things.
* Some parts of the code that used to be excluded from test coverage
  now has tests. Now 19 statements remain that are excluded from coverage.
* Some other code prettification has happened, including some docstring
  improvements.
* `BTRee.remove_range` and `lookup_range` now check that their arguments 
  are of the correct size of keys for that tree.
* `Node` got a new method, `find_pairs`.
* `BTree.dump`, which is useful for debugging, is now nicer to use.
* `NodeStoreDisk` no longer forces the use of `fsync` when it writes
  files. It is not btree's responsibility to decide that on behalf of
  all users. Those who want it can subclass and override the method.
* `RefcountStore` and `UploadQueue` are their own modules, and have
  much better test coverage now. `UploadQueue` got rewritten in terms
  of `LRUCache`.
* New `BTree.range_is_empty` method, for those (few) cases where one just
  needs to know if there are any keys, and where getting all keys with
  `lookup_range` would be slow.
* `BTree.lookup_range` is now a generator, which should reduce memory
  consumption and thus speed things up in cases where a very large number
  of keys are about to be returned.
  
Removed stuff:

* The NodeStore API no longer wants a `listdir` method. It has been
  removed from NodeStoreDisk.
* `RefcountStore` no longer logs statistics. They did not seem to be
  useful.
* `IndexNode` no longer explicitly checks the types of its arguments.
  This was wasting CPU cycles, and it did not once find an actual bug.


Version 0.13, released 2010-07-13
---------------------------------

* Speed-related improvement: The size of NodeStoreDisk's LRU cache for
  nodes is now user-settable.


Version 0.12, released 2010-07-11
---------------------------------

* Some speed optimizations.
* The BTree objects no longer have a root_id attribute. Instead, use
  BTree.root.id (if BTree.root is not None).


Version 0.11, released 2010-07-05
---------------------------------

* Now includes a small example program, written for the Wellington Python
  User Group meeting where the first talk about this library was given.
* `NodeStoreDisk` stores nodes in a simple directory hierarchy, instead of
  a flat one, to better handle very large trees.
* `NodeStoreDisk` now has a much larger default upload queue size (now 1024,
  was 64), to better handle large trees.
* `speed-test` now also tests `remove` and `remove_range` methods. Further,
  it reports both CPU and wall clock timings, and has been refactored a bit.
  Results for `lookup_range` are no longer comparable with old versions,
  since the measured scenario has changed.
* `remove_range` has been rewritten and is now much faster.


Version 0.10, released 2010-06-29
---------------------------------

* Storage of node reference counts is now more efficient. 
* NodeStoreDisk stores reference counts and nodes in files in a subdirectory 
  of the directory root, which is an incompatible change, so trees made by 
  earlier versions can't be used anymore. (That's what you get for using 
  alpha level code. Obviously, once btree is declared to be ready for
  production, the on-disk data structures will be supported in future
  versions.)
* Nodes that exist in only one tree are modified in place, rather than
  via copy-on-write. This is impure, but brings a big speed boost.
* New test script: insert-remove-test. "make check" automatically runs it.
* Many optimizations to NodeCodec and elsewhere, by Richard Braakman.


Version 0.9, released 2010-05-26
--------------------------------

* Several speed optimizations.
* One of this requires a Least-Recently-Used cache, which is provided by
  my python-lru package. That is now a dependency.
* NodeStoreDisk now has an upload queue. This is so that when a node gets
  immediately overwritten, it can be removed from the queue, i.e.,
  removed before it gets encoded and written at all. This provides quite
  significant speedups.
* A bug fix for reference counting of index node is fixed. This allows
  the speedups from the upload queue to actually occur.
* On-disk node encodings have changed. Anything written by previous
  versions is now unusable.