summaryrefslogtreecommitdiff
path: root/yarns/0030-basics.yarn
blob: 7e4dffa264daa4398f20326ae1333f3209e1c2c0 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
Basic operation: backup and restore
===================================

This chapter tests the basic operation of Obnam: backing up and
restoring data. Tests in this chapter only concern themselves with a
single generation; see later for tests for multiple generations.

The goal of this chapter is to test Obnam with every kind of data,
every kind of file, and every kind of metadata.

Backup simple data
------------------

This is the simplest of all simple backup tests: generate a small
amount of data in regular files, in a single directory, and backup
that. No symlinks, no empty files, no extended attributes, no nothing.
Just a few files with a bit of data in each. This is what every backup
program must be able to handle.

    SCENARIO backup simple data
    GIVEN 100kB of new data in directory L
    AND a manifest of L in M
    WHEN user U backs up directory L to repository R
    AND user U restores their latest generation in repository R into X
    THEN L, restored to X, matches manifest M
    AND user U can fsck the repository R

Backup sparse files
-------------------

Sparse files present an interesting challenge to backup programs. Most
people have none, but some people have lots, and theirs can have very
large holes. For example, at work I often generate disk images as
raw disk images in sparse files. The image may need to be, say 30
gigabytes in size, even though it only contains one or two gigabyte of
data. The rest is a hole.

A backup program should restore a sparse file as a sparse file.
Otherwise, the 30 gigabyte disk image file will, upon restore, use 30
gigabytes of disk space, rather than one. That might make restoring
impossible.

Unfortunately, it is not easy to (portably) check whether a file is
sparse. We'll settle for making sure the restored file does not use
more disk space than the one in live data.

    SCENARIO backup a sparse file
    GIVEN a file S in L, with a hole, data, a hole
    AND a manifest of L in M
    WHEN user U backs up directory L to repository R
    AND user U restores their latest generation in repository R into X
    THEN L, restored to X, matches manifest M
    AND file S from L, restored in X doesn't use more disk

Backup all interesting file and metadata types
----------------------------------------------

The Unix filesystem abstraction is surprisingly complicated. Indeed,
it can come as a surprise to anyone who's not implemented a backup
program with the intention of being able to restore the live data set
exactly. To complicate things further, different filesystems have
different features, and different Unix-like operating systems don't
all implement all the features, and implement some features
differently.

We need to ensure Obnam can handle anything it encounters, on any
supported platform. That is the purpose of the scenarios in this
section. There are some limitations, though: the test suite is not run
as the `root` user, and thus we don't deal with filesystem objects
that require priviledged operations such as device node creation. We
also don't, in these scenarios, handle multiple filesystem types: the
test suite should, instead, be run multiple types, with `TMPDIR` set
to point at a different filesystem type each time: we leave that to
the user running the test suite.

We rely on a helper tool in the Obnam source tree, `mkfunnyfarm`, to
create all the interesting filesystem objects, rather than spelling
them out in the scenarios. This is because that helper tool is used by
other parts of Obnam's test suite as well, and this reduces code
duplication.

    SCENARIO backup non-basic filesystem objects
    GIVEN directory L with interesting filesystem objects
    AND a manifest of L in M
    WHEN user U backs up directory L to repository R
    AND user U restores their latest generation in repository R into X
    THEN L, restored to X, matches manifest M

As a special case, Obnam needs to notice when only an extended
attribute value changes.

    SCENARIO backup notices when extended attribute value changes
    GIVEN a file F in L, with data
    AND file L/F has extended attribute user.foo set to foo
    WHEN user U backs up directory L to repository R
    GIVEN file L/F has extended attribute user.foo set to bar
    AND a manifest of L in M
    WHEN user U backs up directory L to repository R
    AND user U restores their latest generation in repository R into X
    THEN L, restored to X, matches manifest M

Backup to roots at once
-----------------------

Often it's useful to backup more than one location at once. We'll
assume that if we can backup two, then it'll all work well.

    SCENARIO backup two roots
    GIVEN directory L1 with interesting filesystem objects
    AND directory L2 with interesting filesystem objects
    AND a manifest of L1 in M1
    AND a manifest of L2 in M2
    WHEN user U backs up directories L1 and L2 to repository R
    AND user U restores their latest generation in repository R into X
    THEN L1, restored to X, matches manifest M1
    THEN L2, restored to X, matches manifest M2

Checkpoint generations
----------------------

Obnam is meant to remove checkpoint generations it created during a
backup, if the backup finishes successfully.

    SCENARIO checkpoint generations are removed
    GIVEN 100kB of new data in directory L
    AND user U sets configuration checkpoint to 1k
    WHEN user U backs up directory L to repository R
    THEN user U sees no checkpoint generations in repository R

Restore a single file
---------------------

We need to be able to restore only a single file. Note that when
restoring a single file, we do not set the parent directory's
modification time according to the backup, so we need to manipulate
the manifest to avoid getting an error.

    SCENARIO restore a single file
    GIVEN a file F in L, with data
    AND a manifest of L/F in M
    WHEN user U backs up directory L to repository R
    AND user U restores file L/F to X from their latest generation in repository R
    THEN L/F, restored to X, matches manifest M

Pretend backing up: the `--pretend` setting
-------------------------------------------

The `--pretend` setting lets the user pretend they're doing a backup,
without actually having anything backed up. This is useful for testing
that the configuration is correct: the fake backup runs much faster
than a real one.

    SCENARIO a pretend backup
    GIVEN directory L with interesting filesystem objects
    WHEN user U backs up directory L to repository R
    GIVEN a manifest of R in M1
    WHEN user U pretends to back up directory L to repository R
    GIVEN a manifest of R in M2
    THEN manifests M1 and M2 match 

Exclude cache directories
-------------------------

The [Cache directory tagging] standard provides an easy way to mark
specific directories as cache directories, which means their data is
easy to re-create (or re-download). Such data is often not worth
backing up. The `--exclude-caches` option tells Obnam to exclude any
directories tagged like that.

[Cache directory tagging]: http://www.bford.info/cachedir/

    SCENARIO exclude cache directories
    GIVEN 1k of new data in directory L/wanted
    AND 1k of new data in directory L/cache
    AND directory L/cache is tagged as a cache directory

We'll now create the manifest, but remove `L/cache` (and files in
`L/cache`) so that it matches what we need. We do it this instead of
creating the manifest before `L/cache`, because creating `L/cache`
changes the timestamp of `L`.

    AND a manifest of L in M
    AND cache is removed from manifest M

Time to backup.

    AND user U sets configuration exclude-caches to yes
    WHEN user U backs up directory L to repository R
    AND user U restores their latest generation in repository R into X
    THEN L, restored to X, matches manifest M

Changing backup roots
---------------------

When we change the backup roots, i.e., the directories we want backed
up, we do not want the any dropped backup roots to be included in the
new backup.

    SCENARIO replace backup root with new one
    GIVEN 1k of new data in directory L1
    AND 1k of new data in directory L2
    WHEN user U backs up directory L1 to repository R
    AND user U backs up directory L2 to repository R
    AND user U lists latest generation in repository R into F
    THEN nothing in F matches L1

Pre-epoch timestamps
--------------------

It's possible to have timestamps before the epoch, i.e., negative
ones. For example, in the UK during DST, `touch -t 197001010000` will
create one. Test that such timestamps work.

    SCENARIO pre-epoch timestamps
    GIVEN file L/file has Unix timestamp -3600
    AND a manifest of L in M
    WHEN user U backs up directory L to repository R
    AND user U restores their latest generation in repository R into X
    THEN L, restored to X, matches manifest M

Change B-tree node size
-----------------------

The setting for B-tree node size (`--node-size`) only affects new
B-trees. Thus, if we've backed up with one size, and change the
setting to a new size, the backup should still work.

    SCENARIO backup with changed B-tree node size
    GIVEN 100kB of new data in directory L
    AND user U sets configuration node-size to 65536
    WHEN user U backs up directory L to repository R
    GIVEN 100Kb of new data in directory L
    AND a manifest of L in M
    AND user U sets configuration node-size to 4096
    WHEN user U backs up directory L to repository R
    AND user U restores their latest generation in repository R into X
    THEN L, restored to X, matches manifest M
    AND user U can fsck the repository R