summaryrefslogtreecommitdiff
path: root/README
blob: 1a0e58173764aae2f7342d5d26ca05d64435424f (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
Larch, a Python B-tree library
==============================

This is an implementation of particular kind of B-tree, based on research by 
Ohad Rodeh. See "B-trees, Shadowing, and Clones" (link below) for details on 
the data structure. This is the same data structure that btrfs uses. Note 
that my implementation probably differs from what the paper describes or 
what btrfs implements.

The distinctive feature of this B-tree implementation is that a node is 
never modified. Instead, all updates are done by copy-on-write. Among other 
things, this makes it easy to clone a tree, and modify only the clone, while 
other processes access the original tree. This is utterly wonderful for my 
backup application, and that's the reason I wrote larch in the first place.

(The previous paragraph contains a small lie: larch does modify nodes in
place, if they are not shared between trees. This is necessary for
performance.)

I have tried to keep the implementation generic and flexibile, so that you 
may use it in a variety of situations. For example, the tree itself does not 
decide where its nodes are stored: you provide a class that does that for it. 
I have two implementations of the NodeStore class, one for in-memory and one 
for on-disk storage.

Documentation is sparse. Docstrings and reading the code are your best hope.
There is also an example program. The speed-test benchmark program may
also be useful. (Help and feedback and prodding welcome!)

See the file example.py for an example.

* Homepage: <http://liw.fi/larch/>
* Version control: `bzr get http://code.liw.fi/larch/bzr/trunk/`
* Rodeh paper: <http://liw.fi/larch/ohad-btrees-shadowing-clones.pdf>


Build and install
-----------------

setup.py should work as usual for distutils. But it might not: I have
only used enough to make the Debian package build.

You can also use the Debian packaging, if on Debian or a derivative.
Just run "debuild -us -uc" to build the package. Or you can get packages
from my site (see <http://liw.fi/code/> for details).


Hacking
-------

The actual tree code is in the larch directory, laid out as a normal
Python package.

* `tree.py` is the actual tree implementation
* `nodes.py` has tree nodes
* `nodestore.py` defines the interface for storing nodes on disk or
  wherever; `nodestore_disk.py` and `nodestore_memory.py` are two 
  implementations of the interface
* `codec.py` handles encoding of nodes for on-disk storage, and decoding too
* `forest.py` handles creation of new trees and committing things to a node
  store

Run `make check` to run the test suite. You will need my CoverageTestRunner
and LRUCache, and extrautils packages:

* <http://liw.fi/lru/>
* <http://liw.fi/coverage-test-runner/>
* <http://liw.fi/extrautils/>
* <http://liw.fi/tracing/>

The unit test modules are paired with their corresponding code modules:
for code module `foo.py` there exists a unit test module `foo_tests.py`.
CoverageTestRunner makes sure each code module's unit test achieve
100% test coverage for that module, not counting explicitly excluded
parts of the code. You can also use `nosetests` to run the tests.

I have two scripts to run simplistic benchmarks:

* `codec-speed 1000 no` tests speed of the NodeCodec class, 
  for encoding tree nodes for on-disk storage.
* `speed-test` tests insert and lookup speeds in trees. Use a `--keys`
  settings that is reasonably large, e.g., a million or ten.

You may want to run either or both before and after making any changes,
to see if you're making things better or worse.

If you have any patches, please send them to me (<mailto:liw@liw.fi>).
Bzr bundles preferred, but plain diffs are always OK. Or set up a bzr
branch of your own and send me a URL, and I'll merge from that.


Bugs and other things to hack
-----------------------------

There is an [SD](http://syncwith.us/sd/) bug repository for larch.
I try to keep it up to date with regards to bugs and wishlist items,
etc.

* Bugs in SD: <http://code.liw.fi/larch/bugs/>

If you're interested in hacking the larch code, speed improvements
are always interesting. See the bug tracker for things known to be
slow, or run `speed-test` on large numbers of keys, and use profiling 
to see where time is wasted.