summaryrefslogtreecommitdiff
path: root/manual/en/060-backing-up.mdwn
diff options
context:
space:
mode:
authorLars Wirzenius <liw@liw.fi>2014-03-29 11:43:45 +0000
committerLars Wirzenius <liw@liw.fi>2014-03-29 11:43:45 +0000
commit2dee685a1f8fb954fbeb9fd9a9d0dbb57b34b8ee (patch)
treecb629d2d27b44eeaae262fbb9975a67048b26317 /manual/en/060-backing-up.mdwn
parent6d27c778c2c51129d5882c2c5adf2aeac9d36e06 (diff)
downloadobnam-2dee685a1f8fb954fbeb9fd9a9d0dbb57b34b8ee.tar.gz
Move English manual texts to en subdir
Diffstat (limited to 'manual/en/060-backing-up.mdwn')
-rw-r--r--manual/en/060-backing-up.mdwn388
1 files changed, 388 insertions, 0 deletions
diff --git a/manual/en/060-backing-up.mdwn b/manual/en/060-backing-up.mdwn
new file mode 100644
index 00000000..e35efa8c
--- /dev/null
+++ b/manual/en/060-backing-up.mdwn
@@ -0,0 +1,388 @@
+Backing up
+==========
+
+This chapter discusses the various aspects of making backups with
+Obnam.
+
+Your first backup
+-----------------
+
+Let's make a backup! To walk through the examples in this directory,
+you need to have some live data to backup. The examples use specific
+filenames for this. You'll need to adapt the examples to your own
+files. The examples assume your home directory is `/home/tomjon`, and
+that you have a directory called `Documents` in your home directory
+for your documents. Further, it assumes you have a USB drive mounted
+at `/media/backups`, and that you will be using a directory
+`tomjon-repo` on that drive as the backup repository.
+
+With those assumptions, here's how you would backup your documents:
+
+ obnam backup -r /media/backups/tomjon-repo ~/Documents
+
+That's all. It will take a little while, if you have a lot of
+documents, but eventually it'll look something like this:
+
+ Backed up 11 files (of 11 found),
+ uploaded 97.7 KiB in 0s at 647.2 KiB/s average speed
+
+(In reality, the above text will be all on one line, but that didn't
+fit in this manual's line width.)
+
+This tells you that Obnam found a total of eleven files, of which it
+backed up all eleven. The files contained a total of about a hundred
+kilobytes of data, and that the upload speed for that data was over
+six hundred kilobytes per second. The actual units are using IEC
+prefixes, which are base-2, for unambiguity. See
+[Wikipedia on kibibytes] for more information.
+
+[Wikipedia on kibibytes]: https://en.wikipedia.org/wiki/Kibibyte
+
+Your first backup run should probably be quite small to see that
+all settings are right without having to wait a long time. You may
+want to choose a small directory to start with, instead of your entire
+home directory.
+
+Your second backup
+------------------
+
+Once you've run your first backup, you'll want to run a second one.
+It's done the same way:
+
+ obnam backup -r /media/backups/tomjon-repo ~/Documents
+
+Note that you don't need to tell Obnam whether you want a full backup
+or an incremental backup. Obnam makes each backup generation be a
+snapshot of the data at the time of the backup, and doesn't make a
+difference between full and incremental backups. Each backup
+generation is equal to each other backup generation. This doesn't mean
+that each generation will store all the data separately. Obnam makes
+sure each new generation only backs up data that isn't already in the
+repository. Obnam finds that data in any file in any previous
+generation, amongst all the clients sharing the same repository.
+
+We'll later cover how to remove backup generations, and you'll learn
+that Obnam can remove any generation, even if it shares some of the
+data with other generations, without those other generations losing
+any data.
+
+After you've your second backup generation, you'll want to see the
+generations you have:
+
+ $ obnam generations -r /media/backups/tomjon-repo
+ 2 2014-02-05 23:13:50 .. 2014-02-05 23:13:50 (14 files, 100000 bytes)
+ 5 2014-02-05 23:42:08 .. 2014-02-05 23:42:08 (14 files, 100000 bytes)
+
+This lists two generations, which have the identifiers 2 and 5. Note
+that generation identifiers are not necessarily a simple sequence like
+1, 2, 3. This is due to how some of the internal data structures of
+Obnam are implemented, and not because its author in any way thinks
+it's fun to confuse people.
+
+The two time stamps for each generation are when the backup run
+started and when it ended. In addition, for each generation is a count
+of files in that generation (total, not just new or changed files),
+and the total number of bytes of file content data they have.
+
+Choosing what to backup, and what not to backup
+-----------------------------------------------
+
+Obnam needs to be told what to back up, by giving it a list of
+directories, known as backup roots. In the examples in this chapter so
+far, we've used the directory `~/Documents` (that is, the directory
+`Documents` in your home directory) as the backup root. There can be
+multiple backup roots:
+
+ obnam -r /media/backups/tomjon-repo ~/Documents ~/Photos
+
+Everything in the backup root directories gets backedup -- unless it's
+explicitly excluded. There are several ways to exclude things from
+backups:
+
+* The `--exclude` setting uses regular expressions that match the full
+ pathname of each file or directory: if the pathname matches, the
+ file or directory is not backed up. In fact, Obnam pretends it
+ doesn't exist. If a directory matches, then any files and
+ subdirectories also get excluded. This can be used, for example, to
+ exclude all MP3 files (`--exclude='\.mp3$'`).
+* The `--exclude-caches` setting excludes directories that contain a
+ special "cache tag" file called `CACHEDIR.TAG`, that starts with a
+ specific sequence of bytes. Such a tag file can be created in, for
+ example, a Firefox or other web browser cache directory. Those files
+ are usually not important to back up, and tagging the directory
+ can be easier than constructing a regular expression for
+ `--exclude`.
+* The `--one-file-system` setting excludes any mount points and the
+ contents of the mounted filesystem. This is useful for skipping,
+ for example, virtual filesystems such as `/proc`, remote filesystems
+ mounted over NFS, and Obnam repositories mounted with `obnam mount`
+ (which we'll cover in the next chapter).
+
+In general it is better to back up too much rather than too little.
+You should also make sure you know what is and isn't backed up. The
+`--pretend` option tells Obnam to run a backup, except it doesn't
+change anything in the backup repository, so it's quite fast. This way
+you can see what would be backed up, and tweak exclusions as needed.
+
+Configuration files: a quick intro
+----------------------------------
+
+By this time you may have noticed that Obnam has a number of
+configurable settings you can tweak in a number of ways. Doing it on
+the command line is always possible, but then you get quite long
+command lines. You can also put them into a configuration file.
+
+Every command line option Obnam knows can be set in a configuration
+file. Later in this manual there is a whole chapter that covers all
+the details of configuration files, and all the various settings you
+can use. For now, we'll give a quick introduction.
+
+An Obnam configuration looks like this:
+
+ [config]
+ repository = /media/backup/tomjon-repo
+ root = /home/liw/Documents, /home/liw/Photos
+ exclude = \.mp3$
+ exclude-caches = yes
+ one-file-system = no
+
+This form of configuration file is commonly known as an "INI file",
+from Microsoft Windows `.INI` files. All the Obnam settings go into a
+section titles `[config]`, and each setting has the same name as the
+command line option, but without the double dash prefix. Thus, it's
+`--exclude` on the command line and `exclude` in the configuration
+file.
+
+Some settings can have multiple values, such as `exclude` and `root`.
+The values are comma separated. If there's a lot of values, you can
+split them on multiple lines, where the second and later lines are
+indented by space or TAB characters.
+
+That should get you started, and you can reference the "Obnam
+configuration files and settings" chapter for all the details.
+
+When your precious data is very large
+-------------------------------------
+
+When your precious data is very large, the first backup may a very
+long time. Ditto, if you get a lot of new precious data for a later
+backup. In these cases, you may need to be very patient, and just let
+the backup take its time, or you may choose to start small and add to
+the backups a bit at a time.
+
+The patient option is easy: you tell Obnam to backup everything, set
+it running, and wait until it's done, even if it takes hours or days.
+If the backup terminates prematurely, e.g., because of a network link
+going down, you won't have to start from scratch thanks to Obnam's
+checkpoint support. Every gigabyte or so (by default) Obnam stops a
+backup run to create a checkpoint generation. If the backup later
+crashes, you can just re-run Obnam and it will pick up from the latest
+checkpoint. This is all fully automatic, you don't need to do anything
+for it to happen. See the `--checkpoint` setting for choosing how
+often the checkpoints should happen.
+
+The only problem with the patient option is that your most precious
+data doesn't get backed up while all your large, but less precious
+data is being backed up. For example, you may have a large amount of
+downloaded videos of conference presentations, which are nice, but not
+hugely important. While those get backed up, your own documents do not
+get backed up.
+
+You can work around this by initially excluding everything except the
+most precious data. When that is backed up, you gradually reduce the
+excludes, re-running the backup, until you've backed up everything.
+As an example, your first backup might have the following
+configuration:
+
+ obnam backup -r /media/backups/tomjon-repo ~ \
+ --exclude ~/Downloads
+
+This would exclude all downloaded files. The next backup run might
+exclude only video files:
+
+ obnam backup -r /media/backups/tomjon-repo ~ \
+ --exclude ~/Downloads/'.*\.mp4$'
+
+After this, you might reduce excludes to allow a few videos, such as
+those whose name starts with a specific letter:
+
+ obnam backup -r /media/backups/tomjon-repo ~ \
+ --exclude ~/Downloads/'[^b-zB-Z].*\.mp4$'
+
+Continue allowing more and more videos until they've all been backed
+up.
+
+De-duplication
+--------------
+
+Obnam de-duplicates the data it backs up, across all files in all
+generations for all clients sharing the repository. It does this by
+breaking up all file data into bits called chunks. Every time Obnam
+reads a file and gets a chunk together, it looks into the backup
+repository to see if an identical chunk already exists. If it does,
+Obnam doesn't need to upload the chunk, saving space, bandwidth, and
+time.
+
+De-duplication in Obnam is useful in several situations:
+
+* When you have two identical files, obviously. They might have
+ different names, and be in different directories, but contain the
+ same data.
+* When a file keeps growing, but all the new data is added at the end.
+ This is typical for log files, for example. If the leading chunks
+ are unmodified, only the new data needs to be backed up.
+* When a file or directory is renamed or moved. If you decide that the
+ English name for the `Photos` directory is annoying and you want to
+ use the the Finnish `Valokuvat` instead, you can rename that in an
+ instant. However, without de-duplication, you then have to backup
+ all your photos again.
+* When all a team works on the same things, and everyone has copies of
+ the same files, the backup repository only needs one copy of each
+ file, rather than one per team member.
+
+De-duplication in Obnam isn't perfect. The granularity of finding
+duplicate data is quite coarse (see the `--chunk-size` setting), and
+so Obnam often doesn't find duplication when it exists, when the
+changes are small.
+
+De-duplication and safety against checksum collisions
+-----------------------------------------------------
+
+This is a bit of a scary topic, but it would be dishonest to not
+discuss it at all. Feel free to come back to this section later.
+
+Obnam uses the MD5 checksum algorithm for recognising duplicate data
+chunks. MD5 has a reputation for being unsafe: people have constructed
+files that are different, but result in the same MD5 checksum. This is
+true. MD5 is not considered safe for security critical applications.
+
+Every checksum algorithm can have collisions. Changing Obnam to use,
+say, SHA1, SHA2, or the as new SHA3 algorithm would not remove the
+chance of collisions. It would reduce the chance of accidental
+collisions, but the chance of those is already so small with MD5 that
+it can be disregarded. Or put in another way, if you care about the
+chance of accidental MD5 collisions, you should be caring about
+accidental SHA1, SHA2, or SHA3 collisions as well.
+
+Apart from accidental collisions, there are two cases where you should
+worry about checksum collisions (regardless of algorithm).
+
+First, if you have an enemy who wishes to corrupt your backed up data,
+they may replace some of the backed up data with other data that has
+the same checksum. This way, when you restore, your data is corrupted
+without Obnam noticing.
+
+Second, if you're into researching checksum collisions, you're likely
+to have files that cause checksum collisions, and in that case, if you
+restore after a catastrophe, you probably want to get the files back
+intact, rather having Obnam confuse one with the other.
+
+To deal with these situations, Obnam has three de-duplication modes,
+set using the `--deduplicate` setting:
+
+* The default mode, `fatalist`, assumes checksum collisions do not
+ happen. This is a reasonable compromise between performance, safety,
+ and security for most people.
+* The `verify` mode assumes checksum collisions do happen, and
+ verifies that the already backed up chunk is identical to the chunk
+ to be backed up, by comparing the actual data. Doing this requires
+ downloading the chunk from the backup repository, which can be quite
+ slow, since checksums will often match. This is a useful mode if you
+ have very fast access to the backup repository, and want to
+ de-duplicate, such as when the backup repository is on a locally
+ connected hard drive.
+* The `never` mode turns off de-duplication completely. This is
+ useful if you're worried about checksum collisions, and do not
+ require de-duplication.
+
+There is, unfortunately, no way to get both de-duplication that is
+invulnerable to checksum collision and is fast even when accessing the
+backup repository is slow. The only way to be invulnerable is to
+compare the data, and if downloading the data from the repository is
+slow, then the comparison will take significant time.
+
+Locking
+-------
+
+Multiple clients can share a repository, and to prevent them from
+trampling on each other, they lock parts of the repository while
+working. The "Sharing a repository between multiple clients" chapter
+will discuss this in more detail.
+
+If Obnam terminates abruptly, even if there's only one evern client
+using the repository, the lock may stay around and prevent that one
+client for making new backups. The termination may be due to the
+network connection breaking, or due to a bug in Obnam. It can also
+happen if Obnam is interrupted by the user before it's finished.
+
+The Obnam command `force-lock` deals with this situation. It is
+dangerous, though. If you force open a lock that is in active use by
+a running Obnam instance, there will likely to be some stepping of
+toes. The result may, in extreme cases, even result in repository
+corruption. So be careful.
+
+If you've decided you can safely do it, this is an example of how to
+do it:
+
+ obnam -r /media/backups/tomjon-repo force-lock
+
+Note that some of the locks are per-client, to prevent you from
+accidentally running Obnam twice for the same client, which would
+result in standing on your own toes: kind of impressive, but
+uncomfortable and not recommended.
+
+If you need to force open a lock for specific client, you can specify
+the client name explicitly:
+
+ obnam --client-name magrat \
+ -r /media/backups/tomjon-repo force-lock
+
+(Long line broken to two for typographical reasons.)
+
+Consistency of live data
+------------------------
+
+Making a backup can take a good while. While the backup is running,
+the filesystem may change. This leads to the snapshot of data Obnam
+presents as a backup generation being internally inconsistent. For
+example, before a backup you might have two files, A and B, which need
+to be kept in sync. You run a backup, and while it's happening, you
+change A, and then B. However, you're unlucky, and Obnam manages to
+backup A before you save your changes, and B after you save changes to
+that. The backup generation now has versions of A and B that are not
+synchronised. This is bad.
+
+This can be dealt with in various ways, depending on the
+circumstances. Here's a few:
+
+* The Logical Volume Manager (LVM) provides snapshots. You can set up
+ your backups so that they first create a snapshot of each logical
+ volume to be backed up, run the backup, and delete the snapshot
+ afterwards. This prevents anyone from modifying the files in the
+ snapshot, but allows normal use to continue while the backup
+ happens.
+* A similar thing can be done using the btrfs filesystem and its
+ subvolumes.
+* You can shut down the system, reboot it into single user mode, and
+ run the backup, before rebooting back into normal mode. This is not
+ a good way to do it, but it is the most safe way to get a consistent
+ snapshot of the filesystem.
+
+Note that filesystem level snapshots can't really guarantee a
+consistent view of the live data. An application may be in the middle
+of writing a file, or set of files, when the snapshot is being made.
+To some extent this indicates an application bug, but knowing that
+doesn't let you sleep better.
+
+Usually, though, most systems have enough idle time that a consistent
+backup snapshot can happen during that time. For a laptop, for
+example, a backup can be run whlie the user is elsewhere, instead of
+actively using the machine.
+
+Part of your backup verification suite should check that the data in a
+backup generation is internally consistent, if that can be done.
+Otherwise, you'll either have to analyse the applications you use, or
+trust they're not too buggy.
+
+If you didn't underatand this section, don't worry and be happy and
+sleep well.