diff options
Diffstat (limited to 'bugs/bgproc.mdwn')
-rw-r--r-- | bugs/bgproc.mdwn | 103 |
1 files changed, 0 insertions, 103 deletions
diff --git a/bugs/bgproc.mdwn b/bugs/bgproc.mdwn deleted file mode 100644 index be834fd..0000000 --- a/bugs/bgproc.mdwn +++ /dev/null @@ -1,103 +0,0 @@ -[[!tag obnam-wishlist]] - -Obnam should do some processing in the background, for example uploading -of data to the backup repository. This would allow better use of the -bottleneck resource (network). Below is a journal entry with my thoughts -on how to implement that. It may be out of date by now, but we'll see. -I have a Python module to simplify the use of `multiprocessing` to do -jobs in the background (which avoids the Python global interpreter lock, -in case that matters). --liw - ---- - -Here's a design for Obnam concurrency that came to me the other -day while walking. - -The core of Obnam (and larch) is quite synchronous: read data from -file, read B-tree nodes, push chunks and B-tree nodes into repository. -Some of that can be parallelized, but not easily: it's already tricky -code, and making it even more tricky is going to require very strong -justification. - -Things like encrypting and decrypting files need to be done in parallel -with other things, for speed. These things are not really in the -core, and indeed are provided by plugins. - -So here's a way to them in parallel: - -* the core code stays synchronous, the way it is now -* whenever larch code needs to read a B-tree node, - it blocks until it gets it -* the node is read, synchronously, from wherever, and put into - a background processing queue (using Python multiprocess) -* the code that waits for the node to be processed polls the queue, - and handles any other background jobs that happen to finish while - it waits, and returns the desired node when it gets it -* when larch writes a node (after it gets pushed out - of the upload queue inside larch), it is put into a background - processing queue -* at the same time, if there were any finished background jobs, they're - handled (written to repo) -* at the end of the run, the main loop makes sure any pending background - jobs finish and are handled - -There's a complication that the B-tree code may need a node that is -not yet written to the repository, since it is still going through -a background processing queue. - -I'm going to need to restructure how hooks process files that -are written to or read from the repository. Writing should happen -asynchronously: files are put in a queue and processed in the -background, and then written to the actual repository when background -processing is finished. Reading needs to happen synchronously, since -there's a B-tree call waiting for them, but to handle the case of -needing a node that is still being processed -in the background, we need to keep track of what nodes are in the -background, and wait for them to be done before reading them. - -Reading would thus be something like this, implemented in the -`Repository` class: - - while wanted file is in write queue: - process a write queue result - - read file from repository - process file data through hooks - return file - -The write queue is more complicated (again handled somehow in the -`Repository` class): - -* a `multiprocessing.Queue` instance for holding pending jobs - - a job is a (pathname, file contents) pair -* another `Queue` instance for holding unhandled results - - (pathname, file contents) pair, where the contents may have changed -* a `set` for holding file identifiers (paths) that have been put into - the pending jobs queue, but not yet processed from the results queue - -Each plugin can provide one or more Unix commands (filters) through -which the file contents gets piped. The background processes run each -filter in turn, giving the output of the previous one as input to the -next one. - -To handle a result from a background job, the following needs to be done: - -* remove the pathname from the `set` -* write the filtered file contents into the repository - -To implement this, I'll do this: - -* All changes should be in `HookedFS` -* `write_file` and `overwrite_file` put things into the pending jobs queue, - and also call a new method `handle_background_results` -* `cat` gets changed to wait for files in the write queue, calling - `handle_background_results` -* `handle_background_results` will do what is needed - -This design isn't optimal, since writing things to the repository -isn't being done in parallel with other things, but I'll tackle that -problem later. - - -[[done]] this clearly isn't happening, so closing the old wishlist -bug --liw |