From 0cecea8e2bcf421de715ccc200504dd2c1df9d53 Mon Sep 17 00:00:00 2001
From: Lars Wirzenius <liw@liw.fi>
Date: Sun, 20 Dec 2015 17:03:42 +0100
Subject: Add explanation of when Obnam de-dup works badly

---
 manual/en/060-backing-up.mdwn | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

(limited to 'manual/en')

diff --git a/manual/en/060-backing-up.mdwn b/manual/en/060-backing-up.mdwn
index 16ccc5d2..1213918d 100644
--- a/manual/en/060-backing-up.mdwn
+++ b/manual/en/060-backing-up.mdwn
@@ -356,6 +356,29 @@ duplicate data is quite coarse (see the  `--chunk-size` setting), and
 so Obnam often doesn't find duplication when it exists, when the
 changes are small.
 
+De-duplication isn't useful in the following scenarios:
+
+* A file changes such that things move around within the file. The
+  (current) Obnam de-duplication is based on non-overlapping chunks
+  from the beginning of a file. If some data is inserted, Obnam won't
+  notice that the chunks have shifted around. This can happen, for
+  example, for disk or ISO images.
+
+* Files with duplicate data that is not on a chunk boundary. For
+  example, emails with large attachments. Each email recipient gets
+  different `Received` headers, which shifts the body and attachments
+  by different amounts. As a result, Obnam won't notice the
+  duplication.
+
+* Data in compressed files, such as `.zip` or `.tar.xz` files. Obnam
+  doesn't know about the file compression, and only sees the
+  compressed version of the data. Thus, Obnam won'd de-duplicate it.
+
+A future version of Obnam will hopefully improve the de-duplication
+algorithms. If you see this optimistic paragraph in a version of Obnam
+released in 2017 or later, please notify the maintainers. Thank you.
+
+
 De-duplication and safety against checksum collisions
 -----------------------------------------------------
 
-- 
cgit v1.2.1