Discussion:
[9fans] fossil archive corruption - solved
(too old to reply)
Richard Miller
2012-03-22 16:30:17 UTC
Permalink
I've finally solved the mystery of data corruption in fossil archives.

As usual, the cause is obvious once you've seen it. The clue is this
line and comment in /sys/src/cmd/fossil/cache.c:/^doRemoveLink:
/*
* We're not really going to overwrite b, but if we're not
* going to look at its contents, there is no point in reading
* them from the disk.
*/
b = cacheLocalData(c, p->addr, p->type, p->tag, recurse ? OReadOnly : OOverWrite, 0);

The context is that b is a copy-on-write block from an earlier epoch
which has just been copied, and doRemoveLink needs only the block's
label (not its data) in order to remove it from the active file system
by setting its state to BsClosed.

The problem is that when cacheLocalData loads a block from disk with
mode OOverWrite, it sets its iostate to BioClean even though only the
block's label has been read. In all other instances of OOverWrite, this
is benign because the caller immediately overwrites the block's data and
sets its iostate to BioDirty before unlocking it. Even in the case of
doRemoveLink, usually the block won't have needed to be loaded from disk
because it has recently been copied and therefore is still in the cache
with valid data. But if the block did need to be loaded again, the
cache entry will will now be marked BioClean but with invalid data (ie
the contents of whatever other block most recently used that buffer).

Now suppose a 'snap -a' is running concurrently, and the copied block in
question is from the just-finished epoch but hasn't been archived yet.
If the timing is right (wrong), the snapshot thread will find the block
in the cache and write its bad data permanently to venti.

The consequence will depend on the type of the invalid block. If it's a
pointer or dir block, the archive operation may fail with "illegal block
address" errors when trying to walk to its descendents; if it's a meta
data block, some portion of the archived tree may become inaccessible
with "corrupted meta data" messages. Both of these have been reported
from time to time in 9fans. If it's a data block, the archive will
simply contain some bad data. I've observed this at least once, and who
knows how many times it has happened unnoticed?

The simplest fix would be to revert to the original fossil code in which
doRemoveLink always loaded the block with OReadOnly. But in line with
the Principle of Least Astonishment (and to guard against anyone else
making a mistake with a similar "optimisation"), I think the right thing
is for cacheLocalData to set the block's iostate to BioLabel, not
BioClean, when only the label has been read.

Patch is fossil-archive-corruption.
c***@gmx.de
2012-03-22 16:48:03 UTC
Permalink
you rock!

--
cinap
Steve Simon
2012-03-22 17:57:16 UTC
Permalink
Kudos to Mr Miller,

-Steve
steve
2012-03-22 20:06:24 UTC
Permalink
apologies to the others involved, i sent this before i read all of richard's email.

well done all, i have been hoping people with more skill than i might fix this.

-Steve
Post by Steve Simon
Kudos to Mr Miller,
-Steve
David du Colombier
2012-03-22 18:32:47 UTC
Permalink
Good catch!

I've not been able to easily reproduce this problem.
How did you proceed?
--
David du Colombier
Richard Miller
2012-03-22 19:24:16 UTC
Permalink
Post by David du Colombier
I've not been able to easily reproduce this problem.
How did you proceed?
Not very reproducible, but I found it could be encouraged to happen
(on a sacrificial secondary fossil+venti) with this:
tar c /n/xfossil/sys/src >/dev/null &
while (sleep 60) echo snap -a >>/srv/xfscons

An easily recognisable consequence when it fails is a BtDir block with
invalid contents. So I peppered the source with diagnostics checking
for that, until I was able to spot exactly where it happened.
Nemo
2012-03-22 19:54:36 UTC
Permalink
impressive. hats off.

--
iphone kbd. excuse typos :)
Post by Richard Miller
Post by David du Colombier
I've not been able to easily reproduce this problem.
How did you proceed?
Not very reproducible, but I found it could be encouraged to happen
tar c /n/xfossil/sys/src >/dev/null &
while (sleep 60) echo snap -a >>/srv/xfscons
An easily recognisable consequence when it fails is a BtDir block with
invalid contents. So I peppered the source with diagnostics checking
for that, until I was able to spot exactly where it happened.
Continue reading on narkive:
Loading...