Larch, a Python B-tree library ============================== This is an implementation of particular kind of B-tree, based on research by Ohad Rodeh. See "B-trees, Shadowing, and Clones" (link below) for details on the data structure. This is the same data structure that btrfs uses. Note that my implementation probably differs from what the paper describes or what btrfs implements. The distinctive feature of this B-tree implementation is that a node is never modified. Instead, all updates are done by copy-on-write. Among other things, this makes it easy to clone a tree, and modify only the clone, while other processes access the original tree. This is utterly wonderful for my backup application, and that's the reason I wrote larch in the first place. (The previous paragraph contains a small lie: larch does modify nodes in place, if they are not shared between trees. This is necessary for performance.) I have tried to keep the implementation generic and flexibile, so that you may use it in a variety of situations. For example, the tree itself does not decide where its nodes are stored: you provide a class that does that for it. I have two implementations of the NodeStore class, one for in-memory and one for on-disk storage. Documentation is sparse. Docstrings and reading the code are your best hope. There is also an example program. The speed-test benchmark program may also be useful. (Help and feedback and prodding welcome!) See the file example.py for an example. * Homepage: * Version control: `git clone git://git.liw.fi/larch` * Rodeh paper: Stability --------- The larch on-disk file format and data structures are frozen as of version 0.31. That means any B-trees stored on disk with that version, or any later version, will be readable by future versions of larch. This prompts a new version numbering scheme for Larch. In the future, the version number is of the form `FORMAT.DATE`, where `FORMAT` is the on-disk format version for Larch (currently 1), and the `DATE` is the date of the release. Build and install ----------------- setup.py should work as usual for distutils. But it might not: I have only used enough to make the Debian package build. You can also use the Debian packaging, if on Debian or a derivative. Just run "debuild -us -uc" to build the package. Or you can get packages from my site (see for details). Hacking ------- The actual tree code is in the larch directory, laid out as a normal Python package. * `tree.py` is the actual tree implementation * `nodes.py` has tree nodes * `nodestore.py` defines the interface for storing nodes on disk or wherever; `nodestore_disk.py` and `nodestore_memory.py` are two implementations of the interface * `codec.py` handles encoding of nodes for on-disk storage, and decoding too * `forest.py` handles creation of new trees and committing things to a node store Run `make check` to run the test suite. You will need my CoverageTestRunner and LRUCache, and extrautils packages: * * * * The unit test modules are paired with their corresponding code modules: for code module `foo.py` there exists a unit test module `foo_tests.py`. CoverageTestRunner makes sure each code module's unit test achieve 100% test coverage for that module, not counting explicitly excluded parts of the code. You can also use `nosetests` to run the tests. I have two scripts to run simplistic benchmarks: * `codec-speed 1000 no` tests speed of the NodeCodec class, for encoding tree nodes for on-disk storage. * `speed-test` tests insert and lookup speeds in trees. Use a `--keys` settings that is reasonably large, e.g., a million or ten. You may want to run either or both before and after making any changes, to see if you're making things better or worse. If you have any patches, please send them to me (). I prefer to merge from git repositories, but `git format-patch` or just plain diffs are always OK. Bugs and other things to hack ----------------------------- See for the bug tracker. If you're interested in hacking the larch code, speed improvements are always interesting. See the bug tracker for things known to be slow, or run `speed-test` on large numbers of keys, and use profiling to see where time is wasted. Legalese -------- Copyright 2010, 2011, 2012, 2013 Lars Wirzenius This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see .