[9fans] duppage

Discussion:

[9fans] duppage

(too old to reply)

c***@gmx.de

2012-02-25 03:29:33 UTC

discovered odd behaviour on a mp system. it was running rchttpd and
werc and after like 3 or 7 days of load, broken sed and grep processes
appeared in the process table. inspecting the process with acid
yields a strange picture. the process crashed (or aborted themselves)
before any data was read from stdin just after allocating some memory
(grep uses sbrk() directly, where sed uses pool malloc). in the case
of grep, the global bloc seemed to have been reset to a past value,
and seds mainmem structure was also inconsistent with reality.

while trying to put the pieces together, something interesting came up.

duppage() is called by fixfault for COW, with a locked, image backed,
non shared page. it makes a new copy for the image cache, and then
removes the page from the image cache.

to do this, it has to allocate a new page from the page allocator,
temporarily unlocking the page. what we observe is that when duppage
reacquires the page lock, the pages refcount sometimes is >1 meaning
another processor just grabed that page out of the image cache.
(tested this with a print() and it triggered multiple times right after boot)

fixfault still assumes the page to be non shared and inserts it into
the process pagetable.

a change that rechecks the refcount after calling duppage() in
fixfault() and doing a copy like for the ref > 1 case seems to have
made the problem go away. (system is running for 8 days now)

anyone with a mp system can confirm this?

--
cinap

Charles Forsyth

2012-02-25 20:09:47 UTC

Permalink

I think we should institute a Sherlock Holmes award at iwp9.
(It wouldn't mean you need to throw yourself off a building.)

erik quanstrom

2012-02-26 04:13:56 UTC

Permalink

Post by c***@gmx.de
a change that rechecks the refcount after calling duppage() in
fixfault() and doing a copy like for the ref > 1 case seems to have
made the problem go away. (system is running for 8 days now)
anyone with a mp system can confirm this?

the description sounds logical, but i haven't seen this.
do you have a good way to replicate this? it would be
good to keep around for testing.

- erik

Richard Miller

2012-02-29 10:31:08 UTC

Permalink

Post by c***@gmx.de
anyone with a mp system can confirm this?

Yes, I've confirmed by experiment that duppage(lkp) can return
with lkp->ref > 1.

Is what you've found related to the "XXX - here's a bug" comment
in duppage? Maybe that's what really needs fixing.

c***@gmx.de

2012-02-29 14:14:57 UTC

Permalink

no, this is different.

the XXX - there's a bug is about the new page that
duppage() makes. it explains a race where the new
page is on the freelist and the image cache.
whats important is that when someone (lookpage)
locks the page, its refcount and image/daddr is
consistent wich should be the case as far as i can
see.

if you are paranoid, you can just check in newpage()
auxpage() and duppage() what the refcount of the page
is you just grabbed from the freelist is. if its anything
other than 0, then we hit that XXX bug. (done that,
never triggered)

at the point when duppage unlocks(np), the following
things are possible:

lookpage() could succeed locking np, finding its ref
to be 0 and incrementing its ref and unchaining it
from the freelist. (all while holding palloc lock
so it wont interfere with newpage() or auxpage())

if newpage(), auxpage() or duppage() got the lock first,
the page will be uncached, or point to a differnt
image/daddr (duppage). (with some luck, it might even
point to the same image/daddr, then it just doesnt matter :))

or

someone like newpage() or auxpage() succeeding locking it,
incrementing its ref and removing it from the image cache. the
cachepage() is done by duppage() while holding lock on np
so theres no race.

or

duppage() on another process succeeds with locking np,
removing it from the image cache, and filling it with
contents of the to be duped page and putting it back
in the image cache.

all this looks safe to me, but i might be missing something
of course... anyway, thats about all i know about the
XXX comment :)

back to the duppage bug i was talking about...

the bug that i see is that when duppage() is called,
p has to have a ref of 1. the whole duppage approach
will not work if p is shared with other segments
already.

in the normal case of duppage() when we have p locked,
and p->ref is 1, and we are calling uncachepage(p); before
unlocking p we are safe.

because even if someone like lookpage finds the page in the
image cache, it has to lock it first. when it succeeds
locking p, then lookpage() will find p->image != i because
duppage called uncachepage(p) before unlocking it. so it wont
get shared ever again.

the problem we have is that we temporarily unlock
p to acquire palloc lock, wich opens a chance for
someone to take a ref on p, but duppage doesnt
recheck after reqcquiering the p lock.

if this happens, duppage() has to return and let
fixfault() make a copy for its segment.

--
cinap

Richard Miller

2012-02-29 19:35:06 UTC

Permalink

Post by c***@gmx.de
the problem we have is that we temporarily unlock
p to acquire palloc lock, wich opens a chance for
someone to take a ref on p, but duppage doesnt
recheck after reqcquiering the p lock.
if this happens, duppage() has to return and let
fixfault() make a copy for its segment.

That makes sense to me.

Should we be worried that duppage returns 0 or 1 for
success or failure, but fixfault ignores the return value?

c***@gmx.de

2012-02-29 19:50:35 UTC

Permalink

fixfault() in fault.c is the only user of duppage().

in 9front, i changed the return type of duppage to void.

fixfault() now looks like this:

...
if(lkp->image == &swapimage)
ref = lkp->ref + swapcount(lkp->daddr);
else
ref = lkp->ref;
if(ref == 1 && lkp->image){
/* save a copy of the original for the image cache */
duppage(lkp);
ref = lkp->ref;
}
unlock(lkp);
if(ref > 1){
new = newpage(0, &s, addr);
if(s == 0)
return -1;
*pg = new;
copypage(lkp, *pg);
putpage(lkp);
}
...

--
cinap

Richard Miller

2012-03-03 09:58:19 UTC

Permalink

Post by c***@gmx.de
in 9front, i changed the return type of duppage to void.

Jetzt verstehe ich. Before returning 1 (failure) duppage
always calls uncachepage first, so no harm is done.

Good analysis. How about submitting a patch(1)?

c***@gmx.de

2012-03-03 19:30:59 UTC

Permalink

Post by Richard Miller
Jetzt verstehe ich. Before returning 1 (failure) duppage
always calls uncachepage first, so no harm is done.

exactly!

Post by Richard Miller
How about submitting a patch(1)?

i think geoff is working on it.

i wanted to verify this first. the code is subtile and
there might be even better ways to fix this.

modified the pager and swap code in 9front in other ways
too. so making a patch for sources plan9 requires me do
do some manual recreation of the fix and bind fakery.

--
cinap