Return-Path: X-Original-To: distix@pieni.net Delivered-To: distix@pieni.net Received: from yaffle.pepperfish.net (yaffle.pepperfish.net [88.99.213.221]) by pieni.net (Postfix) with ESMTPS id 5725F4044D for ; Mon, 24 Jul 2017 17:19:55 +0000 (UTC) Received: from platypus.pepperfish.net (unknown [10.112.101.20]) by yaffle.pepperfish.net (Postfix) with ESMTP id 1F9824189F; Mon, 24 Jul 2017 18:19:55 +0100 (BST) Received: from ip6-localhost.nat ([::1] helo=platypus.pepperfish.net) by platypus.pepperfish.net with esmtp (Exim 4.80 #2 (Debian)) id 1dZh1P-00079P-30; Mon, 24 Jul 2017 18:19:55 +0100 Received: from [10.112.101.21] (helo=mx3.pepperfish.net) by platypus.pepperfish.net with esmtps (Exim 4.80 #2 (Debian)) id 1dZh1O-00079E-OH for ; Mon, 24 Jul 2017 18:19:54 +0100 Received: from mail-yw0-f181.google.com ([209.85.161.181]) by mx3.pepperfish.net with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.89) (envelope-from ) id 1dZh1M-0004KK-4g for obnam-support@obnam.org; Mon, 24 Jul 2017 18:19:54 +0100 Received: by mail-yw0-f181.google.com with SMTP id a12so52984459ywh.3 for ; Mon, 24 Jul 2017 10:19:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:reply-to:references:mime-version :content-disposition:in-reply-to:user-agent; bh=zcZUJOIzO1CXYhMjOw+D95FO1ugfKENbXcSnTexUgQI=; b=aF6S5aoCgBsK5H+xbrnZwyf0eBOJ2KJmsExogm89bMoY79qBOaeE3cJsD0bYItrNJ/ dSX3GW0WzVufYu75DwkO4ohgE72HpPsUavqJpey5featSYTqh/9X5dOkXK1PRYjqCwUF EFlHqEVgiaeKpBDdgfRkArfogaTsOT6x5ZPbRUnlzC6JsLtQPs8SW3eVbmn5W80qwXRd W/BeutugqivdUZKbRZ9TXtTVoW0ufjZ47A5cGZyfYz8PCAvzgE5VCrmdkbZMDe2V0k9x xn2zXUqHs705pfR29Od/3hZyK3nFkYeyEkBUYogjOfe7goamwsEOWfd+OYZyTtH+Eb+2 5Lkg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:reply-to :references:mime-version:content-disposition:in-reply-to:user-agent; bh=zcZUJOIzO1CXYhMjOw+D95FO1ugfKENbXcSnTexUgQI=; b=tKkObjIhIt2gh0F5TiEa/3PtcpN97WEsB6tCwLJBVAdumeSKuZo+XHeTTlL/H+AGWL 3zU15Zd2rO4kYWyMqugpNCdlx3+4KTNXX8z3cWYdvVDmFkMOmTwCwBxvVcWEnB7F6NHI 2c7N4ox1Lb8Sx4C6jyQomMoTB1c3uNEHkUOXoBtUJK52aNi+PdKMeLBJFMKssMt63L/E 1R8wsSDfAsi8De3dKy4JVNPu8letHqrbuT1JmnxM6h/TfIMYdFTdKVg9URhCsaehpBgO gogDnZG3xIpmK+L6ef6uh0yGiD5XUqXzxuUuMg5sqqrGWUnuB3y3mCOcRB/pJ3tYmWhd IdXQ== X-Gm-Message-State: AIVw113jQQJIpq+NGmcAvMvLK17OWjCtmmouIb/c1Qc7+012flgCwkYe v5Ip9UR2+SXyhQ== X-Received: by 10.129.75.202 with SMTP id y193mr13621906ywa.419.1500916783367; Mon, 24 Jul 2017 10:19:43 -0700 (PDT) Received: from localhost (tripoint.kitware.com. [66.194.253.20]) by smtp.gmail.com with ESMTPSA id k186sm1073097ywd.19.2017.07.24.10.19.42 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 24 Jul 2017 10:19:43 -0700 (PDT) Date: Mon, 24 Jul 2017 13:19:42 -0400 From: Ben Boeckel To: "Laurence Perkins (OE)" Message-ID: <20170724171942.GA17269@megas.kitware.com> References: <1500484994.13826.5.camel@openeye.net> <20170719181232.sdqihqdqldsgzmtd@liw.fi> <1500571405.13826.8.camel@openeye.net> <20170722141756.yzxatuvogrdsh4jv@liw.fi> <1500916329.13826.13.camel@openeye.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <1500916329.13826.13.camel@openeye.net> User-Agent: Mutt/1.8.3 (2017-05-23) X-Pepperfish-Transaction: 9376-3416-98d5-d378 X-Spam-Score: 0.2 X-Spam-Score-int: 2 X-Spam-Bar: / X-Scanned-By: pepperfish.net, Mon, 24 Jul 2017 18:19:54 +0100 X-Spam-Report: Content analysis details: (0.2 points) pts rule name description ---- ---------------------- -------------------------------------------------- 1.0 PPF_FROM_CONTAINS_MAIL The From header contains 'mail' -0.5 PPF_USER_AGENT User-Agent: exists -1.0 PPF_USER_AGENT_MUTT User-Agent: contains Mutt (Mutt isn't a spam tool) 1.2 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (mathstuf[at]gmail.com) -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain -0.0 RCVD_IN_MSPIKE_H3 RBL: Good reputation (+3) [209.85.161.181 listed in wl.mailspike.net] -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [209.85.161.181 listed in list.dnswl.org] 1.5 RCVD_IN_SORBS_SPAM RBL: SORBS: sender is a spam source [209.85.161.181 listed in dnsbl.sorbs.net] -0.0 RCVD_IN_MSPIKE_WL Mailspike good senders X-ACL-Warn: message may be spam X-Scan-Signature: 33c28d4dad91734d931da14cc3b51e8a Cc: "obnam-support@obnam.org" , "liw@liw.fi" Subject: Re: Variable Chunksize X-BeenThere: obnam-support@obnam.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: mathstuf@gmail.com List-Id: Obnam backup software discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: obnam-support-bounces@obnam.org Errors-To: obnam-support-bounces@obnam.org On Mon, Jul 24, 2017 at 17:12:15 +0000, Laurence Perkins (OE) wrote: > Which just made me realise what kind of space savings I could get on > one of my archives... I guess this just moved up the priority list a > ways... Any suggestions for a hash that's fast enough to crawl over > large files without hurting performance too much while still giving > reasonable control over the average chunk size? What about the hashing used by casync in its chunking? From its announcement[1]: The "chunking" algorithm is based on a the buzhash rolling hash function. SHA256 is used as strong hash function to generate digests of the chunks. xz is used to compress the individual chunks. It sounds like you use a rolling hash to find the chunks and then store them according to their SHA256 hash. --Ben [1]http://0pointer.net/blog/casync-a-tool-for-distributing-file-system-images.html _______________________________________________ obnam-support mailing list obnam-support@obnam.org http://listmaster.pepperfish.net/cgi-bin/mailman/listinfo/obnam-support-obnam.org