From 0cecea8e2bcf421de715ccc200504dd2c1df9d53 Mon Sep 17 00:00:00 2001 From: Lars Wirzenius Date: Sun, 20 Dec 2015 17:03:42 +0100 Subject: Add explanation of when Obnam de-dup works badly --- manual/en/060-backing-up.mdwn | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) (limited to 'manual/en') diff --git a/manual/en/060-backing-up.mdwn b/manual/en/060-backing-up.mdwn index 16ccc5d2..1213918d 100644 --- a/manual/en/060-backing-up.mdwn +++ b/manual/en/060-backing-up.mdwn @@ -356,6 +356,29 @@ duplicate data is quite coarse (see the `--chunk-size` setting), and so Obnam often doesn't find duplication when it exists, when the changes are small. +De-duplication isn't useful in the following scenarios: + +* A file changes such that things move around within the file. The + (current) Obnam de-duplication is based on non-overlapping chunks + from the beginning of a file. If some data is inserted, Obnam won't + notice that the chunks have shifted around. This can happen, for + example, for disk or ISO images. + +* Files with duplicate data that is not on a chunk boundary. For + example, emails with large attachments. Each email recipient gets + different `Received` headers, which shifts the body and attachments + by different amounts. As a result, Obnam won't notice the + duplication. + +* Data in compressed files, such as `.zip` or `.tar.xz` files. Obnam + doesn't know about the file compression, and only sees the + compressed version of the data. Thus, Obnam won'd de-duplicate it. + +A future version of Obnam will hopefully improve the de-duplication +algorithms. If you see this optimistic paragraph in a version of Obnam +released in 2017 or later, please notify the maintainers. Thank you. + + De-duplication and safety against checksum collisions ----------------------------------------------------- -- cgit v1.2.1