summaryrefslogtreecommitdiff
path: root/bugs/bgproc.mdwn
diff options
context:
space:
mode:
Diffstat (limited to 'bugs/bgproc.mdwn')
-rw-r--r--bugs/bgproc.mdwn103
1 files changed, 0 insertions, 103 deletions
diff --git a/bugs/bgproc.mdwn b/bugs/bgproc.mdwn
deleted file mode 100644
index be834fd..0000000
--- a/bugs/bgproc.mdwn
+++ /dev/null
@@ -1,103 +0,0 @@
-[[!tag obnam-wishlist]]
-
-Obnam should do some processing in the background, for example uploading
-of data to the backup repository. This would allow better use of the
-bottleneck resource (network). Below is a journal entry with my thoughts
-on how to implement that. It may be out of date by now, but we'll see.
-I have a Python module to simplify the use of `multiprocessing` to do
-jobs in the background (which avoids the Python global interpreter lock,
-in case that matters). --liw
-
----
-
-Here's a design for Obnam concurrency that came to me the other
-day while walking.
-
-The core of Obnam (and larch) is quite synchronous: read data from
-file, read B-tree nodes, push chunks and B-tree nodes into repository.
-Some of that can be parallelized, but not easily: it's already tricky
-code, and making it even more tricky is going to require very strong
-justification.
-
-Things like encrypting and decrypting files need to be done in parallel
-with other things, for speed. These things are not really in the
-core, and indeed are provided by plugins.
-
-So here's a way to them in parallel:
-
-* the core code stays synchronous, the way it is now
-* whenever larch code needs to read a B-tree node,
- it blocks until it gets it
-* the node is read, synchronously, from wherever, and put into
- a background processing queue (using Python multiprocess)
-* the code that waits for the node to be processed polls the queue,
- and handles any other background jobs that happen to finish while
- it waits, and returns the desired node when it gets it
-* when larch writes a node (after it gets pushed out
- of the upload queue inside larch), it is put into a background
- processing queue
-* at the same time, if there were any finished background jobs, they're
- handled (written to repo)
-* at the end of the run, the main loop makes sure any pending background
- jobs finish and are handled
-
-There's a complication that the B-tree code may need a node that is
-not yet written to the repository, since it is still going through
-a background processing queue.
-
-I'm going to need to restructure how hooks process files that
-are written to or read from the repository. Writing should happen
-asynchronously: files are put in a queue and processed in the
-background, and then written to the actual repository when background
-processing is finished. Reading needs to happen synchronously, since
-there's a B-tree call waiting for them, but to handle the case of
-needing a node that is still being processed
-in the background, we need to keep track of what nodes are in the
-background, and wait for them to be done before reading them.
-
-Reading would thus be something like this, implemented in the
-`Repository` class:
-
- while wanted file is in write queue:
- process a write queue result
-
- read file from repository
- process file data through hooks
- return file
-
-The write queue is more complicated (again handled somehow in the
-`Repository` class):
-
-* a `multiprocessing.Queue` instance for holding pending jobs
- - a job is a (pathname, file contents) pair
-* another `Queue` instance for holding unhandled results
- - (pathname, file contents) pair, where the contents may have changed
-* a `set` for holding file identifiers (paths) that have been put into
- the pending jobs queue, but not yet processed from the results queue
-
-Each plugin can provide one or more Unix commands (filters) through
-which the file contents gets piped. The background processes run each
-filter in turn, giving the output of the previous one as input to the
-next one.
-
-To handle a result from a background job, the following needs to be done:
-
-* remove the pathname from the `set`
-* write the filtered file contents into the repository
-
-To implement this, I'll do this:
-
-* All changes should be in `HookedFS`
-* `write_file` and `overwrite_file` put things into the pending jobs queue,
- and also call a new method `handle_background_results`
-* `cat` gets changed to wait for files in the write queue, calling
- `handle_background_results`
-* `handle_background_results` will do what is needed
-
-This design isn't optimal, since writing things to the repository
-isn't being done in parallel with other things, but I'll tackle that
-problem later.
-
-
-[[done]] this clearly isn't happening, so closing the old wishlist
-bug --liw