summaryrefslogtreecommitdiff
path: root/faq/checksum-safety.mdwn
diff options
context:
space:
mode:
Diffstat (limited to 'faq/checksum-safety.mdwn')
-rw-r--r--faq/checksum-safety.mdwn35
1 files changed, 0 insertions, 35 deletions
diff --git a/faq/checksum-safety.mdwn b/faq/checksum-safety.mdwn
deleted file mode 100644
index cab4802..0000000
--- a/faq/checksum-safety.mdwn
+++ /dev/null
@@ -1,35 +0,0 @@
-[[!meta title="Checksum collisions and safety"]]
-
-Obnam is using the MD5 checksum algorithm for recognising duplicate
-data chunks. MD5 has a reputation for being unsafe: people have
-constructed files that are different, but result in the same MD5
-checksum. This is true.
-
-Every checksum algorithm can have collisions. Changing Obnam to, say,
-SHA1, SHA2, or the as yet unreleased SHA3 would not remove the chance
-of collisions. It would reduce the chance of accidental collisions,
-but the chance of those is already so small with MD5 that it can be
-disregarded. Or put in another way, if you care about the chance of
-accidental MD5 collisions, you should be caring about accidental SHA1,
-SHA2, or SHA3 collisions as well.
-
-Apart from accidental collisions, there are two cases where you should
-worry about checksum collisions (regardless of algorithm).
-
-First, if you're into researching checksum collisions, you're likely
-to have files that cause checksum collisions, and in that case, if you
-restore after a catastrophe, you probably want to get the files back
-intact, rather having Obnam confuse one with the other.
-
-Second, if you have an enemy who wishes to corrupt your backed up
-data, they may replace some of the backed up data with other data that
-has the same checksum. This way, when you restore, your data is
-corrupted without Obnam noticing.
-
-For both of these cases, you can instruct Obnam to **verify** that
-chunks of data with the same checksum actually are the same data,
-instead of relying on the checksum alone. This is as safe as it can
-be, but it has a big performance impact. It causes Obnam to have to
-read from the repository (possibly downloading it from your backup
-server) all the data you are backing up. You'll still benefit from the
-de-duplication, however, so your repository size will be smaller.