summaryrefslogtreecommitdiff
path: root/bugs/Mutiple_bugs..mdwn
blob: 205813524cdd468ff14ab8b94f4f003a06de6649 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
Obnam version: 1.0

Environment: backing up from multiple servers (each has its' own repo) to an NFS-shared storage.

OS: OpenSuSE 11.3

---

0) From time to time I get strange locks, which prevents subsequent backups with lock timeout message, though, I am SURE that I'm running only one backup per-client at a time, and each client has dedicated obnam repo. How those locks happen to appear?... noone knows, but they do, for this or that obnam repo (I'm backing up 17 servers with it)

---

1) When trying to fsck with fix:

    /it/bin/obnam/bin/obnam -r /data/backup/mlogin4.smware.local/obnam --fsck-fix fsck
    Traceback (most recent call last):
      File "/it/bin/obnam/lib/python/site-packages/cliapp/app.py", line 172, in _run
        self.process_args(args)
      File "/it/bin/obnam/lib/python/site-packages/obnamlib/app.py", line 158, in    process_args
        cliapp.Application.process_args(self, args)
      File "/it/bin/obnam/lib/python/site-packages/cliapp/app.py", line 407, in process_args
        method(args[1:])
      File "/it/bin/obnam/lib/python/site-packages/obnamlib/plugins/fsck_plugin.py", line 306, in fsck
        for more in reversed(list(work.do() or [])):
      File "/it/bin/obnam/lib/python/site-packages/obnamlib/plugins/fsck_plugin.py", line 250, in do
        work.do()
      File "/it/bin/obnam/lib/python/site-packages/larch/fsck.py", line 107, in do
        tracing.trace('fixed it: %s' % new_node.keys())
    UnboundLocalError: local variable 'new_node' referenced before assignment

That is strange )


---

It would be much more helpful if you filed separate bugs as separate bugs.


---
They are related. These two bugs deal with repo inconsistent state, which obnam leaves behind when something "wrong" happends with it.
On of the errors right before stalled locks are left looks like this (while backing up /local_disk/vpodrezov/ directory): 
       
        <...>

        file_id = self.get_file_id(self.tree, filename)
      File "/it/bin/obnam/lib/python/site-packages/obnamlib/clientmetadatatree.py", line 180, in get_file_id
        raise KeyError('%s does not yet have a file-id' % pathname)
    KeyError: '/local_disk/vpodrezov/save/0019_29052012/os/linux-2.6.36/arch/x86/boot/.svn/tmp/text-base does not yet have a file-id'

And then if we try to backup to the repo, it ends with Lock timeout.

------------


Another error seen is like 

    ERROR: Node 534798 cannot be found in the node store 3662754038122990765

And then the same, Lock timeout. No way to correct this (fsck --fsck-fix just doesnt' work).

---

I'm confused: are you trying to fix lock timeouts with fsck instead of force-lock?

Also, that's a nice, long traceback, but I think it would be more helpful if you made a debug log and posted that.

---

What I want to say is that while backing up live filesystems (when files are being changed), from time to time obnam leaves its' repos in such a state, that no further use for backup is possble (errors, resulting in further locks, which are NOT ALWAYS may be bypassed with force lock (I see multiple lock files around the obnam repo tree, which force-lock doesn't find or know about))

Btw, one of the errors after which I get lock timeouts is (/net/backup/data/backup/mlogin1.smware.local/obnam/ is obnam's repo): 

    ERROR: /net/backup/data/backup/mlogin1.smware.local/obnam/3662754038122990765/new/nodes/65/0/0: File exists

//

Sure, I'll go on then on the mailing list with debug log if it helps, but there are deffinitely problems with backing up live filesystems =(.


----

From discussion on mailing list:

The unbound variable error has been fixed.

The other errors seem to be caused by something in the repository having become corrupt, and
it is now not really possible to get Obnam working properly with it anymore. Ouch. And sorry. It's
no longer possible to see what went wrong either, it seems. It may have been some other error,
or it may be a problem in Obnam. Since this bug no longer seems actionable, I'm marking
it closed, but that doesn't mean eveyrthing's fine. I've made a note to my TODO that I need to
change Obnam to be more easy to debug about these problems in the future (no idea how to
achieve that yet, but we'll see). --liw

[[done]]