1 files changed, 0 insertions, 257 deletions
diff --git a/format-green-albatross.mdwn b/format-green-albatross.mdwn
deleted file mode 100644
index b2ef215..0000000
--- a/format-green-albatross.mdwn
+++ /dev/null
@@ -1,257 +0,0 @@
-[[!meta title="Repository format 'Green Albatross'"]]
-
-This page describes the repository format called 'Green Albatross'. It
-is a preliminary format, intended to improve Obnam performance over
-[[format-6]]. Current development status is PONDERING;
-implementatation has started, but everything on this
-page may and probably will change. Until declared stable, the on-disk
-data structured WILL change without warning. Only use this format to
-help with its development.
-
-
-Introduction
-============
-
-For background and the big picture, please read the [[ondisk]] page.
-This page only discusses details specific to this format.
-
-This repository format takes the following approach:
-
-* Don't use the [Larch](http://liw.fi/larch/) B-tree implementation.
-  It's intricate code and insufficiently fast. Implement any data
-  structures directly.
-* With few, specific exceptions, repository files are never
-  updated. Everything is done copy-on-write. This enables caching. The
-  exceptions are root nodes of DAGs, so that it's easy to know where
-  the DAG starts.
-* Data is stored in objects of various types. Objects may be small,
-  and to avoid having a file per object, objects are collected into
-  bags. This includes chunks. Each bag is stored in its own file.
-  Objects are identified by the bag identifier, plus an index into the
-  bag.
-* Objects are stored in a suitable serialisation encoding.
-    - This is not Python pickles, since Obnam can't assume those are
-      stable over the lifetime of a backup repository.
-    - JSON is not used, since JSON is not suitable for storing binary
-      data, such as filenames, without adding an encoding layer on top
-      of JSON.
-    - Previously this was specified as YAML, but YAML is not fast to
-      parse. So it's not YAML.
-    - Instead, a simple, custom binary encoding will be used. This
-      will encode ints, booleans, binary strings, or lists or dicts
-      (dict keys being lists). A quick prototype shows this to be easy
-      (worked the first time), and fairly fast even without
-      optimisation.
-* Encryption is done exactly like in [[format-6]].
-
-
-Bags
-====
-
-A bag is used to store a number of blobs. Bags are identified by a
-random 64-bit integer. This is used as the filename of the bag. The
-bags are stored in a 3-level directory structure, using the top three
-octets of the bag id as the directory names. Thus, a bag whose id in
-hexadecimal is 0x1234567890abcdef would be stored as
-`12/34/56/1234567890abcdef.bag`.
-
-A bag is implemented as a Python `dict` object:
-
-    {
-        'bag-id': 'cafef00d',
-        'blobs': [...],
-    }
-
-The `items` field contains the blobs. Each blob may be an arbitrary
-byte string (for chunks), or an encoded Python object.
-
-
-Object identifiers
-==================
-
-Object identifiers are a pair consisting of the bag id and an index
-into the bag. Since the identifiers are used frequently, it is
-practical to store them as a unit rather than as a pair. Further, they
-will be visible to the user (and, especially, the developer), so the
-following syntax is used:
-
-    BAGID.INDEX
-
-For example, the first and third objects stored in the bag with id
-0x1234567890abcdef would be:
-
-    1234567890abcdef.0
-    1234567890abcdef.2
-
-Note the use of hexadecimal for the bag id (so all bag identifiers are
-of the same length), and indexing in decimal, starting from zero.
-
-We will keep bags effectively immutable so that an object id does not
-need to change. This means that a bag may contain unused objects. If
-it turns out that that's wasting too much data, we can "pack" bags by
-replacing the unused blobs with empty values (Python's None) to save
-space. This mutates a bag, but only in ways that (correct) users won't
-notice.
-
-
-Client list
-===========
-
-The client list is stored as `client-list/data.bag` in the repository,
-and each item in the bag has the following structure:
-
-    {
-        'client-name': 'foo',
-        'client-id': 123,
-        'encryption-key': None,
-    }
-
-
-Chunks
-======
-
-Chunks are stored in bags. The chunk data is just a binary blob.
-
-
-Chunk indexes
-=============
-
-Chunk indexes map a checksum (using the user's chosen algorithm) to a
-list of chunk ids, and a chunk id to a list of client ids. The root
-object of the indexes is:
-
-    {
-        'checksums': {
-            'algorithm': 'SHA-1',
-            'root-id': '1234567890abcdef.1',
-        },
-        'used-by-tree': '1234567890abcdef.0,,
-    }
-
-
-Checksum to chunk ids
----------------------
-
-The mapping from a checksum value to a list of chunk ids is done using
-a lookup tree that is vaguely similar to a B-tree. The tree contains
-index nodes and leaf nodes. Leaf nodes store the actual mappings:
-
-    {
-        'e242ed3bffccdf271b7fbaf34ed72d089537b42f': [
-            '1234567890abcdef.3',
-            '1234567890abcdef.4',
-        ],
-        'f1d2d2f924e986ac86fdf7b36c94bcdf32beec15': [
-            '1234567890abcdef.5',
-        ],
-    }
-
-In other words, a leaf node is a `dict` mapping a checksum to a list
-of chunk ids whose content has that checksum. Note that the list may
-be longer than one item.
-
-An index node looks like this:
-
-    [
-        {
-            'first-checksum': 'e242ed3bffccdf271b7fbaf34ed72d089537b42f',
-            'last-checksum': 'f1d2d2f924e986ac86fdf7b36c94bcdf32beec15',
-            'leaf-id': '1234567890abcdef.2',
-            'index-id': None,
-        },
-    ]
-
-In other words:
-
-* The index node is a list of mappings, where each mapping corresponds
-  to an object on the next level in the lookup tree.
-* `first-checksum` is the smallest checksum in the sub-tree being
-  referenced.
-* `last-checksum` is the largest checksum.
-* `leaf-id` is the object id of the leaf node, assuming the next level
-  is leaf nodes.
-* `index-id` is the object id of the index node, assuming the next
-  level is index nodes.
-
-The lookup tree is created in a copy-on-write manner. No node is ever
-overwritten, but it may be deleted after it is no longer referenced.
-The tree is not kept in balance, to keep the code maintaining as
-simple as possible.
-
-When a new checksum is inserted into the lookup tree, it is added to
-an existing leaf node by creating a new leaf node that is a copy of
-the old one, and adding the new checksum to the new leaf. A big leaf
-node is split in half. Any index nodes on the path to an updated leaf
-node get updated.
-
-
-Chunk id to client ids
-----------------------
-
-This tree is similar in structure as the checksum tree, but index nodes
-look like this:
-
-    [
-        {
-            'first-client-id': '1234567890abcdef.50',
-            'last-client-id': '1234567890abcdef.51',
-            'leaf-id': '1234567890abcdef.49',
-        },
-    ]
-
-Leaf nodes look like this:
-
-    {
-        '1234567890abcdef.32': [ 123, 456 ],
-    }
-
-Leaf nodes map a chunk id to a list of ids of clients that use the
-chunk.
-
-
-
-Per-client data
-===============
-
-The data stored for each client separately starts with a CLIENT
-object:
-
-    {
-        'client-name': 'foo',
-        'generations': ['1234567890abcdef.77'],
-    }
-
-Each generation is a GEN object:
-
-    {
-        'started': '2015-04-06T17.35:41',
-        'ended': '2015-04-06T17.35:42',
-        'checkpoint': False,
-        'root-dir': '1234567890abcdef.77',
-    }
-
-Each directory in the live data is stored in a DIR object. The object
-stores the metadata for the directory, plus the basenames and metadata
-for each file in the directory (anything thing that isn't a
-subdirectory), plus the basename and DIR object id of each
-subdirectory.
-
-    {
-        'metadata': {
-            'st_mode': 0777,
-            'st_uid': 0,
-            'username': 'root',
-        },
-        'files':
-            'README':
-                'metadata':
-                    'st_mode': 0644,
-                    'st_uid': 0,
-                    'username': 'root'
-        'subdirs':
-            '.git': '1234567890abcdef.127',
-    }
-
-Again, nothing is updated in-place, everything is updated
-copy-on-write. When a node is updated, its parent all the way to the
-root directory also get updated.