summaryrefslogtreecommitdiff
path: root/README
diff options
context:
space:
mode:
authorLars Wirzenius <liw@liw.fi>2010-06-05 12:08:18 +1200
committerLars Wirzenius <liw@liw.fi>2010-06-05 12:08:18 +1200
commit8bbc79cb91a85ecae0d50a21168ba739fba27a34 (patch)
tree2b62a82bed101b7e720f48fae18c0ea514377df1 /README
parent1779c5d96d9143b29233ca8aed891d69a3aded09 (diff)
downloadlarch-8bbc79cb91a85ecae0d50a21168ba739fba27a34.tar.gz
Add a README.
Diffstat (limited to 'README')
-rw-r--r--README80
1 files changed, 80 insertions, 0 deletions
diff --git a/README b/README
new file mode 100644
index 0000000..661f3ba
--- /dev/null
+++ b/README
@@ -0,0 +1,80 @@
+A Python B-tree library
+=======================
+
+This is an implementation of particular kind of B-tree, based on research by
+Ohad Rodeh. See "B-trees, Shadowing, and Clones" (link below) for details on
+the data structure. This is the same data structure that btrfs uses. Note
+that my implementation probably differs from what the paper describes or
+what btrfs implements.
+
+The distinctive feature of this B-tree implementation is that a node is
+never modified. Instead, all updates are done by copy-on-write. Among other
+things, this makes it easy to clone a tree, and modify only the clone, while
+other processes access the original tree. This is utterly wonderful for my
+backup application, and that's the reason I wrote btree in the first place.
+
+I have tried to keep the implementation generic and flexibile, so that you
+may use it in a variety of situations. For example, the tree itself does not
+decide where its nodes are stored: you provide a class that does that for it.
+I have two implementations of the NodeStore class, one for in-memory and one
+for on-disk storage.
+
+Documentation is sparse. Docstrings and reading the code are your best hope.
+(Help and feedback welcome!)
+
+See the file speed-test for an example.
+
+* Homepage: http://liw.fi/btree/
+* Version control: bzr get http://code.liw.fi/btree/bzr/trunk/
+* Rodeh paper: http://www.cs.tau.ac.il/~ohadrode/papers/btree_TOS.pdf
+
+
+Build and install
+-----------------
+
+setup.py should work as usual for distutils. But it might not: I have
+only used enough to make the Debian package build.
+
+You can also use the Debian packaging, if on Debian or a derivative.
+(Just run "debuild -us -uc" to build the package.)
+
+
+Hacking
+-------
+
+The actual tree code is in the btree directory, laid out as a normal
+Python package.
+
+* tree.py is the actual tree implementation
+* nodes.py has tree nodes
+* nodestore.py defines the interface for storing nodes on disk or
+ wherever; nodestore_disk.py and nodestore_memory.py are two implementations
+ of the interface
+* codec.py handles encoding of nodes for on-disk storage, and decoding too
+* forest.py handles creation of new trees and committing things to a node
+ store
+
+Run "make check" to run the test suite. You will need my CoverageTestRunner
+and LRUCache:
+
+* http://liw.fi/lru/
+* http://liw.fi/coverage-test-runner/
+
+The unit test modules are paired with their corresponding code modules:
+for code module foo.py there exists a unit test module foo_tests.py.
+CoverageTestRunner makes sure each code module's unit test achieve
+100% test coverage for that module, not counting explicitly excluded
+parts of the code.
+
+I have two scripts to run simplistic benchmarks:
+
+* codec-speed tests speed of the NodeCodec class, for encoding tree nodes
+ for on-disk storage.
+* speed-test tests insert and lookup speeds in trees.
+
+You may want to run either or both before and after making any changes,
+to see if you're making things better or worse.
+
+If you have any patches, please send them to me (mailto:liw@liw.fi).
+Bzr bundles preferred, but plain diffs are always OK. Or set up a bzr
+branch of your own and send me a URL, and I'll merge from that.