diff options
Diffstat (limited to 'bugs/salsa-tins.mdwn')
-rw-r--r-- | bugs/salsa-tins.mdwn | 39 |
1 files changed, 0 insertions, 39 deletions
diff --git a/bugs/salsa-tins.mdwn b/bugs/salsa-tins.mdwn deleted file mode 100644 index 4525dd4..0000000 --- a/bugs/salsa-tins.mdwn +++ /dev/null @@ -1,39 +0,0 @@ -[[!tag obnam-performance]] - -Problem: If chunk size is reasonably large (say, a megabyte), then -most files will be smaller, and the repository ends up with a large -number of identical files. - -Idea: collect chunks into groups, called "salsa tins". - -- salsa tin = list of chunks -- salsa tin has an id -- chunk id = salsa tin id + suitable number of extra bits for - index into list -- chunk id may be 64 bits total, or 64+32, or whatever seems convenient -- no chunk gets stored alone, only in salsa tins - -This lets a client put things into the repository at will, without -synchronisation or locking beyond what the filesystem provides -(exclusive creation of files). - - ---- - -Having multiple chunks in a single file complicates the logic for -managing files in the repository, and deleting unused chunks. - -Therefore, an alternative idea: instead of shoving multiple chunks -into one file, allow files to use parts of chunks. Currently a -file's metadata lists the chunks that have its contents. Change -this to be a list of (chunk id, offset, length) triplets, where -offset and length specify a part of a chunk. This way, a client can -create one chunk that contains the data of many small files, and -they can all just use the relevant part of the chunk. Managing -removal of those files is easy: it is the current code without -modification. - ---liw - - -This is implemented in git for FORMAT GREEN ALBATROSS. [[done]] --liw |