Merge branch 'liw/benchmarks'

author: Lars Wirzenius <liw@liw.fi> 2014-03-02 20:20:16 +0000
committer: Lars Wirzenius <liw@liw.fi> 2014-03-02 20:20:16 +0000
commit: c463c8bd575dbd1ec64ee028467e5e638e762e88 (patch)
tree: b3dcf32ba9bf34d1156d8dd40132d9a618fa31f5
parent: d7f4b168eb3148d3943eea3fa4d759d3d42a22b0 (diff)
parent: 5fbefbf041a6a8222ff64763ca5aabbfd5449396 (diff)
download: obnam-c463c8bd575dbd1ec64ee028467e5e638e762e88.tar.gz
6 files changed, 565 insertions, 317 deletions
diff --git a/NEWS b/NEWS
index 2a0d17d8..f5c85a30 100644
--- a/NEWS
+++ b/NEWS
@@ -38,6 +38,11 @@ Version 1.7, released UNRELEASED
   future releases. The error codes are meant to be easy to search for,
   and will allow error messages to be translated in the future.
 
+* The `obnam-benchmark` program got rewritten so that it'll do
+  something useful, but at the same time, it is no longer useful as a
+  general tool. It is now expected to be run from the Obnam source
+  tree (a cloned git repository), and isn't installed anymore.
+
 Bug fixes:
 
 * Obnam now creates a `trustdb.gpg` in the temporary GNUPGHOME it uses
diff --git a/README.benchmarks b/README.benchmarks
new file mode 100644
index 00000000..30241aa9
--- /dev/null
+++ b/README.benchmarks
@@ -0,0 +1,79 @@
+README for Obnam benchmarks
+===========================
+
+I've tried a number of approaches to benchmarks with Obnam over the
+years, but no approach has prevailed. This README describes my current
+approach in the hope that it will evolve into something useful.
+
+Ideally I would optimise Obnam for real-world use, but for now, I will
+be content with the simple synthetic benchmarks described here.
+
+Lars Wirzenius
+
+Overview
+--------
+
+I do not want a large number of different benchmarks, at least for
+now. I want a small set that I can and will run systematically, at
+least for each release. Too much data can be just as bad as too little
+data: if it takes too much effort to analyse the data, then that eats
+up from development time. That said, hard numbers are better than
+guesses.
+
+I've decided on the following data sets:
+
+* 10^6 empty files, spread over a 1000 directories with 1000 files
+  each. Obnam has (at least with repository format 6) a high overhead
+  per file, regardless of the contents of the file, and this is a
+  pessimal situation for that.
+
+  The interesting numbers here are: number of files backed up per
+  second, and size of backup repository.
+
+* A single directory with a single file, 2^12 bytes (1 TiB) long.
+  little repetition in the data. This benchmarks the opposite end of
+  the spectrum of number of files vs size of data.
+  
+  The interesting numbers here are number of bytes of actual file data
+  backed up per second and size of backup repository.
+
+Later, I may add more data sets. An intriguing idea would be to
+generate data from [Summain] manifests, where everything except the
+actual file data is duplicated from anonymised manifests captured from
+real systems.
+
+[Summain]: http://liw.fi/summain/
+
+For each data set, I will run the following operations:
+
+* An initial backup.
+* A no-op second generation backup.
+* A restore of the second generation, with `obnam restore`.
+* A restore of the second generation, with `obnam mount`.
+
+I will measure the following about each operation:
+
+* Total wall-clock time.
+* Maximum VmRSS memory, as logged by Obnam itself.
+
+I will additionally capture Python profiler output of each operation,
+to allow easier analysis of where time is going.
+
+I will run the benchmarks without compression or encryption, at least
+for now, and in general use the default settings built into Obnam for
+everything, unless there's a need to tweak them to make the benchmark
+work at all.
+
+Benchmark results
+-----------------
+
+A benchmark run will produce the following:
+
+* A JSON file with the measurements given above.
+* A Python profiling file for each operation for each dataset.
+  (Two datasets times four operations gives eight profiles.)
+
+I will run the benchmark for each release of Obnam, starting with
+Obnam 1.6.1. I will not care about Larch versions at this time: I will
+use the installed version. I will store the resulting data sets in a
+separate git repository for reference.
diff --git a/obnam-benchmark b/obnam-benchmark
index d5dc3dc1..6d3ccd16 100755
--- a/obnam-benchmark
+++ b/obnam-benchmark
@@ -1,6 +1,6 @@
 #!/usr/bin/env python
 #
-# Copyright 2010, 2011  Lars Wirzenius
+# Copyright 2014  Lars Wirzenius
 #
 # This program is free software: you can redistribute it and/or modify
 # it under the terms of the GNU General Public License as published by
@@ -16,203 +16,364 @@
 # along with this program.  If not, see <http://www.gnu.org/licenses/>.
 
 
-import cliapp
-import ConfigParser
-import glob
-import logging
+import json
 import os
+import platform
 import shutil
-import socket
-import subprocess
+import stat
+import sys
 import tempfile
+import time
 
+import cliapp
+import Crypto.Cipher.ARC4
+import larch
+import ttystatus
 
-class ObnamBenchmark(cliapp.Application):
 
-    default_sizes = ['1g/100m']
-    keyid = '3B1802F81B321347'
-    opers = ('backup', 'restore', 'list_files', 'forget')
+class BinaryJunkGenerator(object):
 
-    def add_settings(self):
-        self.settings.string(['results'], 'put results under DIR (%default)',
-                            metavar='DIR', default='../benchmarks')
-        self.settings.string(['obnam-branch'],
-                             'use DIR as the obnam branch to benchmark '
-                                '(default: %default)',
-                              metavar='DIR',
-                              default='.')
-        self.settings.string(['larch-branch'],
-                             'use DIR as the larch branch (default: %default)',
-                             metavar='DIR',
-                            )
-        self.settings.string(['seivot-branch'],
-                             'use DIR as the seivot branch '
-                                '(default: installed seivot)',
-                             metavar='DIR')
-        self.settings.boolean(['with-encryption'],
-                              'run benchmark using encryption')
-
-        self.settings.string(['profile-name'],
-                             'short name for benchmark scenario',
-                             default='unknown')
-        self.settings.string_list(['size'],
-                                  'add PAIR to list of sizes to '
-                                    'benchmark (e.g., 10g/1m)',
-                                  metavar='PAIR')
-        self.settings.bytesize(['file-size'], 'how big should files be?',
-                               default=4096)
-        self.settings.integer(['generations'],
-                              'benchmark N generations (default: %default)',
-                              metavar='N',
-                              default=5)
-        self.settings.boolean(['use-sftp-repository'],
-                              'access the repository over SFTP '
-                                '(requires ssh to localhost to work)')
-        self.settings.boolean(['use-sftp-root'],
-                              'access the live data over SFTP '
-                                '(requires ssh to localhost to work)')
-        self.settings.integer(['sftp-delay'],
-                              'add artifical delay to sftp transfers '
-                                '(in milliseconds)')
-        self.settings.string(['description'], 'describe benchmark')
-        self.settings.boolean(['drop-caches'], 'drop kernel buffer caches')
-        self.settings.string(['seivot-log'], 'seivot log setting')
-
-        self.settings.boolean(['verify'], 'verify restores')
+    key = b'obnam-benchmark'
+    data = b'fake live data' * 1024
 
-    def process_args(self, args):
-        self.require_tmpdir()
-
-        obnam_revno = self.bzr_revno(self.settings['obnam-branch'])
-        if self.settings['larch-branch']:
-            larch_revno = self.bzr_revno(self.settings['larch-branch'])
-        else:
-            larch_revno = None
-
-        results = self.results_dir(obnam_revno, larch_revno)
-
-        obnam_branch = self.settings['obnam-branch']
-        if self.settings['seivot-branch']:
-            seivot = os.path.join(self.settings['seivot-branch'], 'seivot')
-        else:
-            seivot = 'seivot'
-
-        generations = self.settings['generations']
-
-        tempdir = tempfile.mkdtemp()
-        env = self.setup_gnupghome(tempdir)
-
-        sizes = self.settings['size'] or self.default_sizes
-        logging.debug('sizes: %s' % repr(sizes))
-
-        file_size = self.settings['file-size']
-        profile_name = self.settings['profile-name']
-
-        for pair in sizes:
-            initial, inc = self.parse_size_pair(pair)
-
-            msg = 'Profile %s, size %s inc %s' % (profile_name, initial, inc)
-            print
-            print msg
-            print '-' * len(msg)
-            print
-
-            obnam_profile = os.path.join(results,
-                                         'obnam--%(op)s-%(gen)s.prof')
-            output = os.path.join(results, 'obnam.seivot')
-            if os.path.exists(output):
-                print ('%s already exists, not re-running benchmark' %
-                        output)
-            else:
-                argv = [seivot,
-                        '--obnam-branch', obnam_branch,
-                        '--incremental-data', inc,
-                        '--file-size', str(file_size),
-                        '--obnam-profile', obnam_profile,
-                        '--generations', str(generations),
-                        '--profile-name', profile_name,
-                        '--sftp-delay', str(self.settings['sftp-delay']),
-                        '--initial-data', initial,
-                        '--output', output]
-                if self.settings['larch-branch']:
-                    argv.extend(['--larch-branch', self.settings['larch-branch']])
-                if self.settings['seivot-log']:
-                    argv.extend(['--log', self.settings['seivot-log']])
-                if self.settings['drop-caches']:
-                    argv.append('--drop-caches')
-                if self.settings['use-sftp-repository']:
-                    argv.append('--use-sftp-repository')
-                if self.settings['use-sftp-root']:
-                    argv.append('--use-sftp-root')
-                if self.settings['with-encryption']:
-                    argv.extend(['--encrypt-with', self.keyid])
-                if self.settings['description']:
-                    argv.extend(['--description',
-                                 self.settings['description']])
-                if self.settings['verify']:
-                    argv.append('--verify')
-                self.runcmd(argv, env=env)
-
-        shutil.rmtree(tempdir)
-
-    def require_tmpdir(self):
-        if 'TMPDIR' not in os.environ:
-            raise cliapp.AppException('TMPDIR is not set. '
-                                       'You would probably run out of space '
-                                       'on /tmp.')
-        if not os.path.exists(os.environ['TMPDIR']):
-            raise cliapp.AppException('TMPDIR points at a non-existent '
-                                        'directory %s' % os.environ['TMPDIR'])
-        logging.debug('TMPDIR=%s' % repr(os.environ['TMPDIR']))
+    def __init__(self):
+        self.cipher = Crypto.Cipher.ARC4.new(self.key)
+        self.buffer = ''
+
+    def get(self, num_bytes):
+        n = 0
+        result = []
+        while n < num_bytes:
+            if not self.buffer:
+                self.buffer = self.cipher.encrypt(self.data)
+
+            part = self.buffer[:num_bytes - n]
+            result.append(part)
+            n += len(part)
+            self.buffer = self.buffer[len(part):]
+
+        return ''.join(result)
+
+
+class StepInfo(object):
+
+    def __init__(self, label):
+        self.label = label
+        self.info = {
+            'step': label,
+            }
+
+    def add_info(self, key, value):
+        self.info[key] = value
+
+    def stop_timer(self):
+        self.end = time.time()
+
+    def __enter__(self):
+        self.start = time.time()
+        self.end = None
+        return self
+
+    def __exit__(self, exc_type, exc_val, exc_tb):
+        if exc_type is None:
+            if self.end is None:
+                self.end = time.time()
+            self.info['duration'] = self.end - self.start
+        return False
 
-    @property
-    def hostname(self):
-        return socket.gethostname()
+
+class ObnamBenchmark(object):
+
+    def __init__(self, settings, results_dir, srctree, junk_generator):
+        self.settings = settings
+        self.results_dir = results_dir
+        self.srctree = srctree
+        self.junk_generator = junk_generator
+
+    @classmethod
+    def add_settings(self, settings):
+        pass
 
     @property
-    def obnam_branch_name(self):
-        obnam_branch = os.path.abspath(self.settings['obnam-branch'])
-        return os.path.basename(obnam_branch)
-
-    def results_dir(self, obnam_revno, larch_revno):
-        parent = self.settings['results']
-        parts = [self.hostname, self.obnam_branch_name, str(obnam_revno)]
-        if larch_revno:
-            parts.append(str(larch_revno))
-        prefix = os.path.join(parent, "-".join(parts))
-
-        get_path = lambda counter: "%s-%d" % (prefix, counter)
-
-        counter = 0
-        dirname = get_path(counter)
-        while os.path.exists(dirname):
-            counter += 1
-            dirname = get_path(counter)
-        os.makedirs(dirname)
-        return dirname
-
-    def setup_gnupghome(self, tempdir):
-        gnupghome = os.path.join(tempdir, 'gnupghome')
-        shutil.copytree('test-gpghome', gnupghome)
+    def benchmark_name(self):
+        s = self.__class__.__name__
+        if s.endswith('Benchmark'):
+            s = s[:-len('Benchmark')]
+        return s
+
+    def result_filename(self, label, suffix):
+        return os.path.join(
+            self.results_dir,
+            '%s-%s%s' % (self.benchmark_name, label, suffix))
+
+    def run(self):
+        self.tempdir = tempfile.mkdtemp()
+        self.live_data = self.create_live_data_dir()
+        self.repo = self.create_repo()
+        step_infos = []
+
+        steps = [
+            ('create-live-data', self.create_live_data),
+            ('initial-backup', self.backup),
+            ('no-op-backup', self.backup),
+            ('obnam-verify', self.obnam_verify),
+            ('obnam-mount', self.obnam_mount),
+            ('cleanup',
+             lambda si:
+                 self.cleanup(si) if self.settings['cleanup'] else None),
+            ]
+
+        for label, method in steps:
+            print '  %s' % label
+            with StepInfo(label) as step_info:
+                method(step_info)
+            step_infos.append(step_info)
+
+        return {
+            'steps': [step_info.info for step_info in step_infos],
+            }
+
+    def create_live_data_dir(self):
+        live_data = os.path.join(self.tempdir, 'live-data')
+        os.mkdir(live_data)
+        return live_data
+
+    def create_repo(self):
+        repo = os.path.join(self.tempdir, 'repo')
+        os.mkdir(repo)
+        return repo
+
+    def create_live_data(self, step_info):
+        # Subclasses MUST override this.
+        raise NotImplementedError()
+
+    def backup(self, step_info):
+        self.run_obnam(
+            ['backup', '-r', self.repo, self.live_data], step_info.label)
+        step_info.stop_timer()
+        step_info.add_info('repo-size', self.sum_of_file_sizes(self.repo))
+        step_info.add_info(
+            'live-data-size', self.sum_of_file_sizes(self.live_data))
+
+    def obnam_verify(self, step_info):
+        self.run_obnam(
+            ['verify', '-r', self.repo],
+            step_info.label)
+
+    def obnam_mount(self, step_info):
+        mount = os.path.join(self.tempdir, 'mount')
+        os.mkdir(mount)
+
+        self.run_obnam(
+            ['mount', '-r', self.repo, '--to', mount],
+            step_info.label)
+
+        cliapp.runcmd(['tar', '-cf', '/dev/null', mount + '/.'])
+        time.sleep(1)
+
+        try:
+            cliapp.runcmd(['fusermount', '-u', mount])
+        except cliapp.AppException as e:
+            print 'ERROR from fusermount: %s' % str(e)
+
+    def cleanup(self, step_info):
+        shutil.rmtree(self.tempdir)
+
+    def run_obnam(self, args, label):
+        base_command = [
+            self.settings['obnam-cmd'],
+            '--no-default-config',
+            '--log', self.result_filename(label, '.log'),
+            '--log-level', 'debug',
+            ]
         env = dict(os.environ)
-        env['GNUPGHOME'] = gnupghome
-        return env
+        env['OBNAM_PROFILE'] = self.result_filename(label, '.prof')
+        cliapp.runcmd(base_command + args, env=env, cwd=self.srctree)
 
-    def bzr_revno(self, branch):
-        p = subprocess.Popen(['bzr', 'revno'], cwd=branch,
-                             stdout=subprocess.PIPE)
-        out, err = p.communicate()
-        if p.returncode != 0:
-            raise cliapp.AppException('bzr failed')
+    def sum_of_file_sizes(self, root_dir):
+        total = 0
+        for dirname, subdirs, basenames in os.walk(root_dir):
+            for basename in basenames:
+                pathname = os.path.join(dirname, basename)
+                st = os.lstat(pathname)
+                if stat.S_ISREG(st.st_mode):
+                    total += st.st_size
+        return total
 
-        revno = out.strip()
-        logging.debug('bzr branch %s has revno %s' % (branch, revno))
-        return revno
 
-    def parse_size_pair(self, pair):
-        return pair.split('/', 1)
+class EmptyFilesBenchmark(ObnamBenchmark):
 
+    files_per_dir = 1000
 
-if __name__ == '__main__':
-    ObnamBenchmark().run()
+    @classmethod
+    def add_settings(self, settings):
+        settings.integer(
+            ['empty-files-count'],
+            'number of empty files for %s' % self.__class__.__name__,
+            default=10**6)
+
+    @property
+    def num_files(self):
+        return self.settings['empty-files-count']
+
+    def create_live_data(self, step_info):
+        step_info.add_info('empty-files-count', self.num_files)
+        for i in range(self.num_files):
+            subdir = os.path.join(
+                self.live_data, 'dir-%d' % (i / self.files_per_dir))
+            if (i % self.files_per_dir) == 0:
+                os.mkdir(subdir)
+            filename = os.path.join(subdir, 'file-%d' % i)
+            with open(filename, 'w'):
+                pass
+
+
+class SingleLargeFileBenchmark(ObnamBenchmark):
+
+    @classmethod
+    def add_settings(self, settings):
+        settings.bytesize(
+            ['single-large-file-size'],
+            'size of file to create for %s' % self.__class__.__name__,
+            default='1TB')
+
+    @property
+    def file_size(self):
+        return self.settings['single-large-file-size']
+
+    def create_live_data(self, step_info):
+        step_info.add_info('single-large-file-size', self.file_size)
+        filename = os.path.join(self.live_data, 'file.dat')
+        with open(filename, 'w') as f:
+            n = 0
+            max_chunk_size = 2**10
+            ts = ttystatus.TerminalStatus()
+            ts['written'] = 0
+            ts['total'] = self.file_size
+            ts.format(
+                '%ElapsedTime() '
+                'writing live data: %ByteSize(written) of %ByteSize(total) '
+                '(%PercentDone(written,total))')
+            while n < self.file_size:
+                num_bytes = min(max_chunk_size, self.file_size - n)
+                data = self.junk_generator.get(num_bytes)
+                f.write(data)
+                n += len(data)
+                ts['written'] = n
+            ts.clear()
+            ts.finish()
+
+
+class ObnamBenchmarkRunner(cliapp.Application):
+
+    benchmark_classes = [
+        EmptyFilesBenchmark,
+        SingleLargeFileBenchmark,
+        ]
+
+    def add_settings(self):
+        self.settings.string(
+            ['obnam-cmd'],
+            'use CMD as the argv[0] to invoke obnam',
+            metavar='CMD',
+            default='./obnam')
+
+        self.settings.string(
+            ['obnam-treeish'],
+            'run Obnam from TREEISH in its git repository',
+            metavar='TREEISH',
+            default='HEAD')
+
+        self.settings.string(
+            ['results-dir'],
+            'put results in DIR',
+            metavar='DIR',
+            default='.')
+
+        self.settings.boolean(
+            ['cleanup'],
+            'clean up after each benchmark?',
+            default=True)
+
+        for benchmark_class in self.benchmark_classes:
+            benchmark_class.add_settings(self.settings)
 
+    def process_args(self, args):
+        results_dir = self.create_results_dir()
+        self.store_settings_in_results(results_dir)
+        result_obj = {
+            'system-info': self.get_system_info_dict(),
+            'versions': self.get_version_info_dict(),
+            }
+
+        srctree = self.prepare_source_tree()
+
+        junk_generator = BinaryJunkGenerator()
+        benchmark_infos = {}
+        for benchmark_class in self.benchmark_classes:
+            print 'Benchmark %s' % benchmark_class.__name__
+            benchmark = benchmark_class(
+                self.settings, results_dir, srctree, junk_generator)
+            benchmark_info = benchmark.run()
+            benchmark_infos[benchmark.benchmark_name] = benchmark_info
+        result_obj['benchmarks'] = benchmark_infos
+
+        self.save_result_obj(results_dir, result_obj)
+
+        shutil.rmtree(srctree)
+
+    def create_results_dir(self):
+        results = os.path.abspath(self.settings['results-dir'])
+        if not os.path.exists(results):
+            os.mkdir(results)
+        return results
+
+    def store_settings_in_results(self, results):
+        cp = self.settings.as_cp()
+        filename = os.path.join(results, 'obnam-benchmark.conf')
+        with open(filename, 'w') as f:
+            cp.write(f)
+
+    def get_system_info_dict(self):
+        return {
+            'hostname': platform.node(),
+            'machine': platform.machine(),
+            'architecture': platform.architecture(),
+            'uname': platform.uname(),
+            }
+
+    def get_version_info_dict(self):
+        treeish = self.settings['obnam-treeish']
+        sha1 = cliapp.runcmd(['git', 'show-ref', treeish]).split()[0]
+        describe = cliapp.runcmd(['git', 'describe', treeish]).strip()
+        return {
+            'obnam-treeish': treeish,
+            'obnam-sha1': sha1,
+            'obnam-version': describe,
+            'larch-version': larch.__version__,
+            }
+
+    def prepare_source_tree(self):
+        srctree = tempfile.mkdtemp()
+        self.extract_sources_from_git(srctree)
+        self.build_obnam(srctree)
+        return srctree
+
+    def extract_sources_from_git(self, srctree):
+        cliapp.runcmd(
+            ['git', 'archive', self.settings['obnam-treeish']],
+            ['tar', '-C', srctree, '-xf', '-'])
+
+    def build_obnam(self, srctree):
+        cliapp.runcmd(
+            ['python', 'setup.py', 'build_ext', '-i'],
+            cwd=srctree)
+
+    def save_result_obj(self, results_dir, result_obj):
+        filename = os.path.join(results_dir, 'benchmark.json')
+        with open(filename, 'w') as f:
+            json.dump(result_obj, f, indent=4)
+
+
+if __name__ == '__main__':
+    ObnamBenchmarkRunner().run()
diff --git a/obnam-benchmark-summary b/obnam-benchmark-summary
new file mode 100755
index 00000000..7edd3a64
--- /dev/null
+++ b/obnam-benchmark-summary
@@ -0,0 +1,136 @@
+#!/usr/bin/env python
+#
+# Copyright 2014  Lars Wirzenius
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+
+import json
+import os
+
+import cliapp
+
+
+MiB = 2**20
+GiB = 2**30
+
+
+class ObnamBenchmarkSummary(cliapp.Application):
+
+    columns = (
+        ('version', 'version'),
+        ('ef-speed', 'EF files/s'),
+        ('ef-repo-size', 'EF repo (GiB)'),
+        ('lf-speed', 'LF MiB/s'),
+        ('lf-repo-size', 'LF repo (GiB)'),
+        )
+
+    def process_args(self, args):
+        summaries = []
+        for dirname in args:
+            summary = self.summarise_directory(dirname)
+            summaries.append(summary)
+        self.show_summaries(summaries)
+
+    def summarise_directory(self, dirname):
+        filename = os.path.join(dirname, 'benchmark.json')
+        with open(filename) as f:
+            obj = json.load(f)
+
+        return {
+            'version':
+                self.get_obnam_version(obj),
+            'ef-speed':
+                '%.0f' % self.get_empty_files_speed(obj),
+            'ef-files':
+                self.get_empty_files_count(obj),
+            'ef-repo-size':
+                self.format_size(self.get_empty_files_repo_size(obj), GiB),
+            'lf-speed':
+                self.format_size(self.get_large_file_speed(obj), MiB),
+            'lf-size':
+                self.format_size(self.get_large_file_size(obj), GiB),
+            'lf-repo-size':
+                self.format_size(self.get_large_file_repo_size(obj), GiB),
+            }
+
+    def get_obnam_version(self, obj):
+        return obj['versions']['obnam-version']
+
+    def get_empty_files_speed(self, obj):
+        count = self.get_empty_files_count(obj)
+        step = self.find_step(obj, 'EmptyFiles', 'initial-backup')
+        return count / step['duration']
+
+    def get_empty_files_count(self, obj):
+        step = self.find_step(obj, 'EmptyFiles', 'create-live-data')
+        return step['empty-files-count']
+
+    def get_empty_files_repo_size(self, obj):
+        step = self.find_step(obj, 'EmptyFiles', 'initial-backup')
+        return step['repo-size']
+
+    def get_large_file_speed(self, obj):
+        file_size = self.get_large_file_size(obj)
+        step = self.find_step(obj, 'SingleLargeFile', 'initial-backup')
+        return file_size / step['duration']
+
+    def get_large_file_size(self, obj):
+        step = self.find_step(obj, 'SingleLargeFile', 'create-live-data')
+        return step['single-large-file-size']
+
+    def get_large_file_repo_size(self, obj):
+        step = self.find_step(obj, 'SingleLargeFile', 'initial-backup')
+        return step['repo-size']
+
+    def find_step(self, obj, benchmark_name, step_name):
+        for step in obj['benchmarks'][benchmark_name]['steps']:
+            if step['step'] == step_name:
+                return step
+        raise Exception('step %s not found' % step)
+
+    def format_size(self, size, unit):
+        return '%.0f' % (size / unit)
+
+    def show_summaries(self, summaries):
+        lines = [[title for key, title in self.columns]]
+
+        for s in summaries:
+            line = [str(s[key]) for key, title in self.columns]
+            lines.append(line)
+
+        widths = self.compute_column_widths(lines)
+
+        titles = lines[0]
+        results = sorted(lines[1:])
+        for line in [titles] + results:
+            cells = []
+            for i, cell in enumerate(line):
+                cells.append('%*s' % (widths[i], cell))
+            self.output.write(' | '.join(cells))
+            self.output.write('\n')
+
+    def compute_column_widths(self, lines):
+        widths = []
+        n = len(lines[0])
+        for col in range(n):
+            width = 0
+            for line in lines:
+                width = max(width, len(line[col]))
+            widths.append(width)
+        return widths
+
+
+if __name__ == '__main__':
+    ObnamBenchmarkSummary().run()
diff --git a/obnam-benchmark.1.in b/obnam-benchmark.1.in
deleted file mode 100644
index f52ee74c..00000000
--- a/obnam-benchmark.1.in
+++ /dev/null
@@ -1,133 +0,0 @@
-.\" Copyright 2011 Lars Wirzenius <liw@liw.fi>
-.\"
-.\" This program is free software: you can redistribute it and/or modify
-.\" it under the terms of the GNU General Public License as published by
-.\" the Free Software Foundation, either version 3 of the License, or
-.\" (at your option) any later version.
-.\"
-.\" This program is distributed in the hope that it will be useful,
-.\" but WITHOUT ANY WARRANTY; without even the implied warranty of
-.\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-.\" GNU General Public License for more details.
-.\"
-.\" You should have received a copy of the GNU General Public License
-.\" along with this program.  If not, see <http://www.gnu.org/licenses/>.
-.\"
-.TH OBNAM-BENCHMARK 1
-.SH NAME
-obnam-benchmark \- benchmark obnam
-.SH SYNOPSIS
-.SH DESCRIPTION
-.B obnam-benchmark
-benchmarks the
-.BR obnam (1)
-backup application,
-by measuring how much time it takes to do a backup, restore, etc,
-in various scenarios.
-.B obnam-benchmark
-uses the
-.BR seivot (1)
-tool for actually running the benchmarks,
-but makes some helpful assumptions about things,
-to make it simpler to run than running
-.B seivot
-directly.
-.PP
-Benchmarks are run using two different usage profiles:
-.I mailspool
-(all files are small), and
-.I mediaserver
-(all files are big).
-For each profile,
-test data of the desired total size is generated,
-backed up,
-and then several incremental generations are backed up,
-each adding some more generated test data.
-Then other operations are run against the backup repository:
-restoring,
-listing the contents of,
-and removing each generation.
-.PP
-The result of the benchmark is a
-.I .seivot
-file per profile,
-plus a Python profiler file for each run of
-.BR obnam .
-These are stored in
-.IR ../benchmarks .
-A set of
-.I .seivot
-files can be summarized for comparison with
-.BR seivots-summary (1).
-The profiling files can be viewed with the usual Python tools:
-see the
-.B pstats
-module.
-.PP
-The benchmarks are run against a version of
-.B obnam
-checked out from version control.
-It is not (currently) possible to run the benchmark against an installed
-version of
-.BR obnam.
-Also the
-.I larch
-Python library,
-which
-.B obnam
-needs,
-needs to be checked out from version control.
-The
-.B \-\-obnam\-branch
-and
-.B \-\-larch\-branch
-options set the locations,
-if the defaults are not correct.
-.SH OPTIONS
-.SH ENVIRONMENT
-.TP
-.BR TMPDIR
-This variable
-.I must
-be set.
-It controls where the temporary files (generated test data) is stored.
-If this variable was not set,
-they'd be put into
-.IR /tmp ,
-which easily fills up,
-to the detriment of the entire system.
-Thus.
-.B obnam-benchmark
-requires that the location is set explicitly.
-(You can still use
-.I /tmp
-if you want, but you have to set
-.B TMPDIR
-explicitly.)
-.SH FILES
-.TP
-.BR ../benchmarks/
-The default directory where results of the benchmark are stored,
-in a subdirectory named after the branch and revision numbers.
-.SH EXAMPLE
-To run a small benchmark:
-.IP
-TMPDIR=/var/tmp obnam-benchmark --size=10m/1m
-.PP
-To run a benchmark using existing data:
-.IP
-TMPDIR=/var/tmp obnam-benchmark --use-existing=$HOME/Mail
-.PP
-To view the currently available benchmark results:
-.IP
-seivots-summary ../benchmarks/*/*mail*.seivot | less -S
-.br
-seivots-summary ../benchmarks/*/*media*.seivot | less -S
-.PP
-(You need to run
-.B seivots-summary
-once per usage profile.)
-.SH "SEE ALSO"
-.BR obnam (1),
-.BR seivot (1),
-.BR seivots-summary (1).
diff --git a/setup.py b/setup.py
index b421b79d..deda1ba1 100644
--- a/setup.py
+++ b/setup.py
@@ -199,7 +199,7 @@ setup(name='obnam',
       author='Lars Wirzenius',
       author_email='liw@liw.fi',
       url='http://liw.fi/obnam/',
-      scripts=['obnam', 'obnam-benchmark', 'obnam-viewprof'],
+      scripts=['obnam', 'obnam-viewprof'],
       packages=['obnamlib', 'obnamlib.plugins', 'obnamlib.fmt_6'],
       ext_modules=[Extension('obnamlib._obnam', sources=['_obnammodule.c'])],
       data_files=[('share/man/man1', glob.glob('*.1'))],
author	Lars Wirzenius <liw@liw.fi>	2014-03-02 20:20:16 +0000
committer	Lars Wirzenius <liw@liw.fi>	2014-03-02 20:20:16 +0000
commit	c463c8bd575dbd1ec64ee028467e5e638e762e88 (patch)
tree	b3dcf32ba9bf34d1156d8dd40132d9a618fa31f5
parent	d7f4b168eb3148d3943eea3fa4d759d3d42a22b0 (diff)
parent	5fbefbf041a6a8222ff64763ca5aabbfd5449396 (diff)
download	obnam-c463c8bd575dbd1ec64ee028467e5e638e762e88.tar.gz