Discussion:
[9fans] could not write super block; waiting 10 seconds
(too old to reply)
Richard Miller
2012-03-26 10:18:17 UTC
Permalink
Has anyone else been unsettled by the occasional messages from
fossil saying (1) "could not write super block; waiting 10 seconds"
and (2) "blistAlloc: called on clean block"?

Patch fossil-superblock-write gets rid of them.

(1) When taking a snapshot, blockWrite in cache.c is called to write
an updated super block S, which has a pointer to the root block R
for the new epoch. To maintain consistency on the disk, R must be
written before S, so blockWrite checks whether R is still in the
cache and marked dirty. Very rarely, blockWrite finds R locked (eg
because the flush thread is just now writing it), so it gives up and
returns zero. The zero return is OK when blockWrite is called by
the flush thread, because the flush thread can get on with writing
out other blocks before coming back to try the failed block again.
But when blockWrite is called by superWrite, there's nothing else to
do; hence the 10 second sleep and warning message. The solution is
to add a waitlock parameter to blockWrite, so superWrite can tell it
to wait for a locked dependent block.

(2) After the new super block S is sent to the disk write queue,
superWrite removes the previous epoch's root block R' from the
active file system. This is normally done by attaching a BList
entry to S in the cache, noting that R' must be marked closed after
S actually goes to the disk. Rarely, S has already been written by
the time blistAlloc is called. In this case the correct thing was
being done (just close R' immediately), but a spurious warning was
produced.
Russ Cox
2012-03-26 12:04:03 UTC
Permalink
Post by Richard Miller
(1) When taking a snapshot, blockWrite in cache.c is called to write
an updated super block S, which has a pointer to the root block R
for the new epoch.  To maintain consistency on the disk, R must be
written before S, so blockWrite checks whether R is still in the
cache and marked dirty.  Very rarely, blockWrite finds R locked (eg
because the flush thread is just now writing it), so it gives up and
returns zero.  The zero return is OK when blockWrite is called by
the flush thread, because the flush thread can get on with writing
out other blocks before coming back to try the failed block again.
But when blockWrite is called by superWrite, there's nothing else to
do; hence the 10 second sleep and warning message.  The solution is
to add a waitlock parameter to blockWrite, so superWrite can tell it
to wait for a locked dependent block.
(2) After the new super block S is sent to the disk write queue,
superWrite removes the previous epoch's root block R' from the
active file system.  This is normally done by attaching a BList
entry to S in the cache, noting that R' must be marked closed after
S actually goes to the disk.  Rarely, S has already been written by
the time blistAlloc is called.  In this case the correct thing was
being done (just close R' immediately), but a spurious warning was
produced.
Than you for cleaning these up. These are both things that
I meant to come back to some day, but I never did.

Russ

Continue reading on narkive:
Loading...