summaryrefslogtreecommitdiff
path: root/bugs/multiple-checksums.mdwn
blob: 55856812dc608415bfae0d5e7e75ad997c6888ea (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
[[!tag obnam-wishlist]]

From Joey Hess:

> My take on this is that, by choosing to use a tool that uses hashes, I
> am giving up (near-)absolute certainty for speed, or space, or whatever.
> So it's important that the hash type be good at collision resistance (for
> example, no two likely filenames should hash the same; "/etc/passwd"
> should only tend to collide with blobs that are very unlike a filename).
> It's also important that the tool be upfront about using hashes, and
> about what hash it uses. And if it's not designed to allow swapping the
> hash out when it gets broken, I will trust it less (hello git).

Ah, the replacement of hash functions is an interesting problem.

For pathnames, it's not at all important, I think, except perhaps for
performance, since pathnames will be compared byte-by-byte instead of
by hashes.

For file data, replacing is easy, if one is willing to back up everything
from scratch. Supporting several hashes in the same backup store is a
little bit more work, but not a whole lot: instead of having just one
tree for mapping checksums to chunk identifiers, one would have one per
checksum algorithm.

--liw