Discussion:
Kernel panic when compiling Go on native Plan 9
(too old to reply)
Pavel Klinkovsky
2012-11-02 09:50:47 UTC
Permalink
Hi all,

I have troubles to compile Go on my native Plan 9 - Go revision 14739
and later.
I do have a kernel panic when compiling exp/locale/collate.

Does anyone else have the same problem?
Thank you.

Pavel
Pavel Klinkovsky
2012-11-02 10:56:25 UTC
Permalink
I made another test.
I compared compilation of exp/locale/collate of the Go build 14738 and
14739 (and later).

Compilation of 14738 does not consume RAM a lot.
Compilation of 14739 (and later) consume a huge amount of RAM
(involving swap).

So it seems the Plan 9 has a problem with the virtual memory
management (when using swap), IMO.

Pavel
Richard Miller
2012-11-02 11:59:12 UTC
Permalink
Are you running a recent (since 24 August) kernel on a multiprocessor?

If so, try booting with *nomp=1 in plan9.ini and see if running on
just one cpu prevents the panic.

I've just updated my kernel, and I'm getting a panic on opening
a particular email message in acme. Just debugging it now.
erik quanstrom
2012-11-02 13:04:42 UTC
Permalink
Post by Richard Miller
Are you running a recent (since 24 August) kernel on a multiprocessor?
If so, try booting with *nomp=1 in plan9.ini and see if running on
just one cpu prevents the panic.
I've just updated my kernel, and I'm getting a panic on opening
a particular email message in acme. Just debugging it now.
what change do you see as problematic?

i know it's typical for us to run with multi-gigabyte processes. so
assuming that the op has got a couple of gigs of memory and there is
no dastardly swap configured, it would be either
(a) something in the sources kernel has introduced a panic
(b) or something in go is tickling an existing panic.

but the changes i see all look to be corner cases, or startup
code which would fail on boot. i'm missing what change could
have introduced this.

i'd guess an old bug.

- erik
Pavel Klinkovsky
2012-11-02 13:39:07 UTC
Permalink
Post by Richard Miller
Are you running a recent (since 24 August) kernel on a multiprocessor?
I am running the latest kernel (pulled from the bell-labs repository).
I am running the Plan 9 on the single CPU only (old IBM T30).
Post by Richard Miller
If so, try booting with *nomp=1 in plan9.ini and see if running on
just one cpu prevents the panic.
I have got such option set by default.
Post by Richard Miller
I've just updated my kernel, and I'm getting a panic on opening
a particular email message in acme.  Just debugging it now.
I saw the problem occured a couple of seconds after the RAM was full,
and swapping was started.

Pavel
Richard Miller
2012-11-02 13:53:04 UTC
Permalink
Post by Pavel Klinkovsky
I am running the Plan 9 on the single CPU only (old IBM T30).
Sorry, that means we are looking at two different panics.
Mine seems to be a side effect of mpacpi.c trying to enable
the same cpu twice:

cpu0: 1599MHz GenuineIntel Atom (cpuid: AX 0x106C2 DX 0xBFE9FBFF)
...
mpinit: mp table describes 2 cpus
mpinit: scanning acpi madt for extra cpus
...
mpacpiproc: apic 0xf04705c0
mpacpiproc: apic 0xf0470648
cpu1: 1600MHz GenuineIntel Atom (cpuid: AX 0x106C2 DX 0xBFE9FBFF)
cpu1: 1599MHz GenuineIntel Atom (cpuid: AX 0x106C2 DX 0xBFE9FBFF)
...
erik quanstrom
2012-11-02 13:59:31 UTC
Permalink
Post by Richard Miller
Post by Pavel Klinkovsky
I am running the Plan 9 on the single CPU only (old IBM T30).
Sorry, that means we are looking at two different panics.
Mine seems to be a side effect of mpacpi.c trying to enable
cpu0: 1599MHz GenuineIntel Atom (cpuid: AX 0x106C2 DX 0xBFE9FBFF)
...
mpinit: mp table describes 2 cpus
mpinit: scanning acpi madt for extra cpus
...
mpacpiproc: apic 0xf04705c0
mpacpiproc: apic 0xf0470648
cpu1: 1600MHz GenuineIntel Atom (cpuid: AX 0x106C2 DX 0xBFE9FBFF)
cpu1: 1599MHz GenuineIntel Atom (cpuid: AX 0x106C2 DX 0xBFE9FBFF)
i found in nix that when mixing acpi and mp i had to defend against
double-entries for i/o apics since the numbering can be different. but
it will be interesting to see what happened to start 2 cpu1s.

[...]
ioapic: 1 addr fec00000 base 0
ioapic: 3 addr fec8a000 base 24
ioapic: 8 same pa as apic 1
ioapic: 9 same pa as apic 3

- erik
Richard Miller
2012-11-02 15:25:02 UTC
Permalink
Post by erik quanstrom
i found in nix that when mixing acpi and mp i had to defend against
double-entries for i/o apics since the numbering can be different. but
it will be interesting to see what happened to start 2 cpu1s.
Looks like the same thing: MP table says 2 cpus with apic ids 0 and 1,
ACPI table says 2 cpus (the same ones) with ids 0 and 2.

Mixing information from MP and ACPI tables seems to me like just asking
for trouble.
c***@gmx.de
2012-11-02 16:42:48 UTC
Permalink
9front kernel extracts and uses all the cpu and apic and interrupt
routing info from acpi tables if you boot it with *acpi=1. theres
no mixing of mptable and acpi tables.

--
cinap
Pavel Klinkovsky
2012-11-02 14:16:16 UTC
Permalink
Post by Richard Miller
Post by Pavel Klinkovsky
I am running the Plan 9 on the single CPU only (old IBM T30).
Sorry, that means we are looking at two different panics.
Yes.
My panic occurs inside 'fault' (search for 'faultarm' string) function of /sys/src/9/port/fault.c.
It seems as an unfixable page fault.

I thought I have a problem with the swap disk region.
I tried 'cat /dev/sdC0/swap | wc -c' but it was ok.

Pavel
Anthony Martin
2012-11-02 12:19:47 UTC
Permalink
Post by Pavel Klinkovsky
I made another test.
I compared compilation of exp/locale/collate of the Go build 14738 and
14739 (and later).
Compilation of 14738 does not consume RAM a lot.
Compilation of 14739 (and later) consume a huge amount of RAM
(involving swap).
So it seems the Plan 9 has a problem with the virtual memory
management (when using swap), IMO.
How much memory does your system have?

Changeset 14739 grew the Unicode collation tables
in the exp/locale/collate package by a considerable
amount. The compiler's memory usage now goes above
400 MB when building that package, almost 2.5x the
amount used to compile the second heavyweight and
15x the average.

Anthony
Pavel Klinkovsky
2012-11-02 13:39:16 UTC
Permalink
Post by Anthony Martin
How much memory does your system have?
- 512 MB RAM
- 512 MB swap
Post by Anthony Martin
Changeset 14739 grew the Unicode collation tables
in the exp/locale/collate package by a considerable
amount. The compiler's memory usage now goes above
400 MB when building that package, almost 2.5x the
amount used to compile the second heavyweight and
15x the average.
I see.
I can confirm when the kernel panic occured I saw (in stats):
- full RAM
- small portion of swap occupied

Pavel
erik quanstrom
2012-11-02 14:00:10 UTC
Permalink
Post by Pavel Klinkovsky
Post by Anthony Martin
How much memory does your system have?
- 512 MB RAM
- 512 MB swap
Post by Anthony Martin
Changeset 14739 grew the Unicode collation tables
in the exp/locale/collate package by a considerable
amount. The compiler's memory usage now goes above
400 MB when building that package, almost 2.5x the
amount used to compile the second heavyweight and
15x the average.
I see.
- full RAM
- small portion of swap occupied
i might give the 9front kernel a go. i think that cinap spent
some time trying to make swap work a little bit.

i'd wonder though if there were some way to cut down the module
so it doesn't take quite so much memory. even halving it would
mean you could ditch swap.

- erik
Anthony Martin
2012-11-02 14:39:43 UTC
Permalink
Post by erik quanstrom
Post by Pavel Klinkovsky
Post by Anthony Martin
How much memory does your system have?
- 512 MB RAM
- 512 MB swap
Post by Anthony Martin
Changeset 14739 grew the Unicode collation tables
in the exp/locale/collate package by a considerable
amount. The compiler's memory usage now goes above
400 MB when building that package, almost 2.5x the
amount used to compile the second heavyweight and
15x the average.
I see.
- full RAM
- small portion of swap occupied
If you want to work around this for the time being,
it's safe to remove that package since it's currently
an experiment and no other package depends on it.

Just 'rm -rf' the exp/locale/collate directory and
run you should be good.
Post by erik quanstrom
i'd wonder though if there were some way to cut down the module
so it doesn't take quite so much memory. even halving it would
mean you could ditch swap.
There's a note at the top of the generated tables.go file that says
"TODO: implement more compact representation for sparse blocks".

I'm going to investigate what's causing such high memory usage
in the compiler. I imagine those huge array initializations cause
hundreds of thousands of Node allocations, at the very least.

Anthony
erik quanstrom
2012-11-02 15:13:41 UTC
Permalink
Post by Anthony Martin
Just 'rm -rf' the exp/locale/collate directory and
run you should be good.
it would be sad to see go sucked into the locale debacale.

- erik
Pavel Klinkovsky
2012-11-02 16:03:23 UTC
Permalink
Post by Anthony Martin
Just 'rm -rf' the exp/locale/collate directory and
run you should be good.
Thank you for the hint.
I confirm such work-around helps. ;)

Pavel
erik quanstrom
2012-11-02 17:09:21 UTC
Permalink
Here's another solution ;)
http://www.crucial.com/store/mpartspecs.aspx?mtbpoid=09E2BA19A5CA7304
addressing the actual problem FTW!

- erik
Pavel Klinkovsky
2012-11-02 17:42:36 UTC
Permalink
Here's another solution ;)
http://www.crucial.com/store/mpartspecs.aspx?mtbpoid=09E2BA19A5CA7304
Yeah, it really can help!
...until golang guys multiply the compilation memory needs again. ;)

Pavel
Skip Tavakkolian
2012-11-02 17:07:16 UTC
Permalink
Here's another solution ;)

http://www.crucial.com/store/mpartspecs.aspx?mtbpoid=09E2BA19A5CA7304

On Fri, Nov 2, 2012 at 9:03 AM, Pavel Klinkovsky
Post by Pavel Klinkovsky
Post by Anthony Martin
Just 'rm -rf' the exp/locale/collate directory and
run you should be good.
Thank you for the hint.
I confirm such work-around helps. ;)
Pavel
Pavel Klinkovsky
2012-11-02 15:21:59 UTC
Permalink
Post by erik quanstrom
i might give the 9front kernel a go. i think that cinap spent
some time trying to make swap work a little bit.
Well, actually I prefer to follow the main trunk (bell-labs) version.

Pavel
Pavel Klinkovsky
2012-11-02 16:19:11 UTC
Permalink
I prepared the very simple program sequentially allocating 1 MB blocks of memory.
When it reached the end of RAM, the kernel panic occured.

It really seems as a problem with swap. :(

Pavel
erik quanstrom
2012-11-02 16:29:08 UTC
Permalink
Post by Pavel Klinkovsky
I prepared the very simple program sequentially allocating 1 MB blocks of memory.
When it reached the end of RAM, the kernel panic occured.
It really seems as a problem with swap. :(
this is well known, and solutions are available
even if you don't care to use them.

- erik
Pavel Klinkovsky
2012-11-02 17:36:12 UTC
Permalink
Post by erik quanstrom
Post by Pavel Klinkovsky
It really seems as a problem with swap. :(
this is well known, and solutions are available
even if you don't care to use them.
Oh, does it mean the official Plan 9 distribution contains non-working swap? :O
It is clear I missed something...

Sorry for the noise.

Pavel
John Floren
2012-11-02 18:36:57 UTC
Permalink
On Fri, Nov 2, 2012 at 10:36 AM, Pavel Klinkovsky
Post by Pavel Klinkovsky
Post by erik quanstrom
Post by Pavel Klinkovsky
It really seems as a problem with swap. :(
this is well known, and solutions are available
even if you don't care to use them.
Oh, does it mean the official Plan 9 distribution contains non-working swap? :O
It is clear I missed something...
Sorry for the noise.
Pavel
Swap has been broken since at least 2005 (my first experiments with
Plan 9). Once I stopped trying to compile ghostscript on a 32 MB
laptop, I never really had problems with the lack... hell, I did my
master's work and most of my personal computing on a laptop with only
1 GB of RAM and no swap for most of 2010, only ran into problems when
aptitude decided to calculate a multi-gig dependency graph.

Swapping is so painful, please consider buying more RAM. It may be
simple to fix the swap code, if you're inclined to do some kernel
hacking, because the kernel in general is pleasant to work with.


john
erik quanstrom
2012-11-02 18:52:24 UTC
Permalink
Post by John Floren
Swap has been broken since at least 2005 (my first experiments with
Plan 9). Once I stopped trying to compile ghostscript on a 32 MB
laptop, I never really had problems with the lack... hell, I did my
master's work and most of my personal computing on a laptop with only
1 GB of RAM and no swap for most of 2010, only ran into problems when
aptitude decided to calculate a multi-gig dependency graph.
Swapping is so painful, please consider buying more RAM. It may be
simple to fix the swap code, if you're inclined to do some kernel
hacking, because the kernel in general is pleasant to work with.
if you're looking to the future and thinking of dealing with multiple
page sizes, the current swap is going to need rewriting from scratch.

- erik
Skip Tavakkolian
2012-11-02 19:09:02 UTC
Permalink
it's by design:
http://9fans.net/archive/2006/07/229

-Skip

On Fri, Nov 2, 2012 at 10:36 AM, Pavel Klinkovsky
Post by Pavel Klinkovsky
Post by erik quanstrom
Post by Pavel Klinkovsky
It really seems as a problem with swap. :(
this is well known, and solutions are available
even if you don't care to use them.
Oh, does it mean the official Plan 9 distribution contains non-working swap? :O
It is clear I missed something...
Sorry for the noise.
Pavel
erik quanstrom
2012-11-02 19:18:27 UTC
Permalink
Post by Skip Tavakkolian
http://9fans.net/archive/2006/07/229
-Skip
On Fri, Nov 2, 2012 at 10:36 AM, Pavel Klinkovsky
Post by Pavel Klinkovsky
Post by erik quanstrom
Post by Pavel Klinkovsky
It really seems as a problem with swap. :(
this is well known, and solutions are available
even if you don't care to use them.
Oh, does it mean the official Plan 9 distribution contains non-working swap? :O
It is clear I missed something...
Sorry for the noise.
i don't think that's quite fair to the current situation. there
was a swapper, and it's broken. it should either be fixed or
removed. leaving the thing in in the state it's in (buffalo
buffalo?) doesn't make any sense, and is as seperate from
the question of whether to page (to disk) or not as, ahem, vm is from
paging (to disk).

imo, swap needs to go.

- erik
Charles Forsyth
2012-11-02 20:28:01 UTC
Permalink
There's a non-trivial chance that what now goes wrong with paging
(which did once work, even if it isn't great) is a symptom of a bug
that afflicts the virtual memory code itself. (For instance, a page unlocked
during a critical period, a race, and so on.)
Post by Pavel Klinkovsky
Post by Skip Tavakkolian
http://9fans.net/archive/2006/07/229
-Skip
On Fri, Nov 2, 2012 at 10:36 AM, Pavel Klinkovsky
Post by Pavel Klinkovsky
Post by erik quanstrom
Post by Pavel Klinkovsky
It really seems as a problem with swap. :(
this is well known, and solutions are available
even if you don't care to use them.
Oh, does it mean the official Plan 9 distribution contains non-working
swap? :O
Post by Skip Tavakkolian
Post by Pavel Klinkovsky
It is clear I missed something...
Sorry for the noise.
i don't think that's quite fair to the current situation. there
was a swapper, and it's broken. it should either be fixed or
removed. leaving the thing in in the state it's in (buffalo
buffalo?) doesn't make any sense, and is as seperate from
the question of whether to page (to disk) or not as, ahem, vm is from
paging (to disk).
imo, swap needs to go.
- erik
pmarin
2012-11-03 05:43:49 UTC
Permalink
To be clear, is the swap partition completely useless in Plan9?

pmarin.

On Fri, Nov 2, 2012 at 9:28 PM, Charles Forsyth
Post by Charles Forsyth
There's a non-trivial chance that what now goes wrong with paging
(which did once work, even if it isn't great) is a symptom of a bug
that afflicts the virtual memory code itself. (For instance, a page unlocked
during a critical period, a race, and so on.)
Post by erik quanstrom
Post by Skip Tavakkolian
http://9fans.net/archive/2006/07/229
-Skip
On Fri, Nov 2, 2012 at 10:36 AM, Pavel Klinkovsky
Post by Pavel Klinkovsky
Post by erik quanstrom
Post by Pavel Klinkovsky
It really seems as a problem with swap. :(
this is well known, and solutions are available
even if you don't care to use them.
Oh, does it mean the official Plan 9 distribution contains non-working swap? :O
It is clear I missed something...
Sorry for the noise.
i don't think that's quite fair to the current situation. there
was a swapper, and it's broken. it should either be fixed or
removed. leaving the thing in in the state it's in (buffalo
buffalo?) doesn't make any sense, and is as seperate from
the question of whether to page (to disk) or not as, ahem, vm is from
paging (to disk).
imo, swap needs to go.
- erik
erik quanstrom
2012-11-03 06:56:58 UTC
Permalink
if you use the 9front distribution swap works,
if you use the either 9atom or the labs distribution it does not.
the labs or erik may take the fixes from 9front but there are good reasons
for dropping swap all together - it is very slow, rarely used,
and ram is cheap these days.
i think this is a fair summary.
yup.

9atom is dropping swap since the current setup conflicts
with multiple page sizes, and i haven't run swap on any
machine for 10 years.

- erik
steve
2012-11-03 06:53:24 UTC
Permalink
if you use the 9front distribution swap works,
if you use the either 9atom or the labs distribution it does not.

the labs or erik may take the fixes from 9front but there are good reasons
for dropping swap all together - it is very slow, rarely used,
and ram is cheap these days.

i think this is a fair summary.

-Steve
Post by pmarin
To be clear, is the swap partition completely useless in Plan9?
pmarin.
On Fri, Nov 2, 2012 at 9:28 PM, Charles Forsyth
Post by Charles Forsyth
There's a non-trivial chance that what now goes wrong with paging
(which did once work, even if it isn't great) is a symptom of a bug
that afflicts the virtual memory code itself. (For instance, a page unlocked
during a critical period, a race, and so on.)
Post by erik quanstrom
Post by Skip Tavakkolian
http://9fans.net/archive/2006/07/229
-Skip
On Fri, Nov 2, 2012 at 10:36 AM, Pavel Klinkovsky
Post by Pavel Klinkovsky
Post by erik quanstrom
Post by Pavel Klinkovsky
It really seems as a problem with swap. :(
this is well known, and solutions are available
even if you don't care to use them.
Oh, does it mean the official Plan 9 distribution contains non-working swap? :O
It is clear I missed something...
Sorry for the noise.
i don't think that's quite fair to the current situation. there
was a swapper, and it's broken. it should either be fixed or
removed. leaving the thing in in the state it's in (buffalo
buffalo?) doesn't make any sense, and is as seperate from
the question of whether to page (to disk) or not as, ahem, vm is from
paging (to disk).
imo, swap needs to go.
- erik
Charles Forsyth
2012-11-03 09:16:25 UTC
Permalink
And soldered onto motherboards in many ultrabooks.
and ram is cheap these days.
l***@proxima.alt.za
2012-11-03 10:02:48 UTC
Permalink
Post by Charles Forsyth
And soldered onto motherboards in many ultrabooks.
and ram is cheap these days.
Should that "And" not be a "But"? There is little cheap about
soldered RAM, if you need to increase it.

++L

PS: I'd be curious to see a mathematical explanation of the semantic
differences between "and" and "but". Any pointers?
erik quanstrom
2012-11-03 15:40:05 UTC
Permalink
Post by Charles Forsyth
And soldered onto motherboards in many ultrabooks.
and ram is cheap these days.
does plan 9 run on any ultrabooks natively? swapping within a vm?
my head hurts to think of it.

- erik
Charles Forsyth
2012-11-03 15:50:09 UTC
Permalink
We'll see.

I'll reiterate my original remark that it would be worthwhile tracking down
and fixing the virtual memory or paging bug in sources plan 9,
not necessarily to make use of paging, but to ensure that the paging
problem isn't just a symptom of something else that normally gets by.
If 9front doesn't suffer from it, that must help to narrow it down. (I
thought the relevant virtual memory changes from 9front were in
sources, but evidently not. There are so many variants now I've lost track.
Post by erik quanstrom
does plan 9 run on any ultrabooks natively?
erik quanstrom
2012-11-03 16:33:52 UTC
Permalink
Post by erik quanstrom
does plan 9 run on any ultrabooks natively? swapping within a vm?
my head hurts to think of it.
Your head hurts to think that sometimes extra memory is needed? On the
VMs I host for people, I partition 256 MB of memory. This is not enough
to compile python, so you turn on swap before mk, then turn it off again
when you're done. Or not. I'm not a cop.
Either way, nobody has died yet, or even complained about headaches.
usually the vm does paging itself, and often more complicated things like
memory compression and deduplication. so why would the hosted os page as well?

- erik
Kurt H Maier
2012-11-03 16:51:00 UTC
Permalink
Post by erik quanstrom
usually the vm does paging itself, and often more complicated things like
memory compression and deduplication. so why would the hosted os page as well?
Are you deliberately conflating swapping and paging?
erik quanstrom
2012-11-03 17:04:15 UTC
Permalink
Post by Kurt H Maier
Post by erik quanstrom
usually the vm does paging itself, and often more complicated things like
memory compression and deduplication. so why would the hosted os page as well?
Are you deliberately conflating swapping and paging?
in modern systems, i believe they mean the same thing.

http://en.wikipedia.org/wiki/Paging#Terminology
Post by Kurt H Maier
memory deduplication? is that true?
http://lwn.net/Articles/454795/

- erik
Kurt H Maier
2012-11-03 17:13:22 UTC
Permalink
Post by erik quanstrom
in modern systems, i believe they mean the same thing.
http://en.wikipedia.org/wiki/Paging#Terminology
Sorry, I didn't know you were talking about Windows NT.
Post by erik quanstrom
Post by hiro
memory deduplication? is that true?
http://lwn.net/Articles/454795/
hiro was asking if plan 9 deduplicates memory.
Dan Cross
2012-11-03 17:22:50 UTC
Permalink
Post by Kurt H Maier
Post by erik quanstrom
in modern systems, i believe they mean the same thing.
http://en.wikipedia.org/wiki/Paging#Terminology
Sorry, I didn't know you were talking about Windows NT.
I didn't know you were talking about VAX Unix.
Post by Kurt H Maier
Post by erik quanstrom
Post by hiro
memory deduplication? is that true?
http://lwn.net/Articles/454795/
hiro was asking if plan 9 deduplicates memory.
That's odd, because Erik was pretty obviously talking about the host
virtual machine.

But hey; whatever. It's cool.

- Dan C.
Kurt H Maier
2012-11-03 17:28:49 UTC
Permalink
Post by Dan Cross
I didn't know you were talking about VAX Unix.
Thanks for letting me know.
Post by Dan Cross
That's odd, because Erik was pretty obviously talking about the host
virtual machine.
"Host virtual machine," eh?
Post by Dan Cross
But hey; whatever. It's cool.
Relieved you approve.
hiro
2012-11-03 17:38:26 UTC
Permalink
Actually even in linux having something one could call "memory
deduplication" surprises me.
Is there a timemachine for memory on macos?
Steve Simon
2012-11-03 21:29:58 UTC
Permalink
- Dan C.

Glad to see you survived the storm Dan.

-Steve
erik quanstrom
2012-11-03 17:40:48 UTC
Permalink
Post by Kurt H Maier
Post by erik quanstrom
in modern systems, i believe they mean the same thing.
http://en.wikipedia.org/wiki/Paging#Terminology
Sorry, I didn't know you were talking about Windows NT.
Post by erik quanstrom
Post by hiro
memory deduplication? is that true?
http://lwn.net/Articles/454795/
hiro was asking if plan 9 deduplicates memory.
perhaps my comment about double-swap/paging was not clear
enough. i was considering the hosted os, with some standard
vm such as esxi, vbox, xen or whatever as the host. in such a case it makes
no sense to me for the hosted os to page/swap as the hypervisor
is perfectly capable of doing this itself. in fact, i think having
the guest page/swap while the hypervisor is page/swaping is going
to tend to make things more difficult because of i/o contention,
and the fact that doing i/o tends to temporarly increase memory
use.

on the other hand, if you want to press plan 9 into service as the
hypervisor (has anyone done this?), you are going to need fairly
robust swap/pageing capabilities. and if you want to get real utility
out of the hypervisor, you're going to need snapshotting and
support for network connection redirection as well.

- erik
Kurt H Maier
2012-11-03 17:45:55 UTC
Permalink
Post by erik quanstrom
perhaps my comment about double-swap/paging was not clear
enough. i was considering the hosted os, with some standard
vm such as esxi, vbox, xen or whatever as the host. in such a case it makes
no sense to me for the hosted os to page/swap as the hypervisor
is perfectly capable of doing this itself. in fact, i think having
the guest page/swap while the hypervisor is page/swaping is going
to tend to make things more difficult because of i/o contention,
and the fact that doing i/o tends to temporarly increase memory
use.
This makes more sense. However, if your hypervisor is swapping, you've
screwed up your planning. RAM oversubscription is the reason most
dime-store VPS services suck really badly. I leave swapping to the
guest OS, since that's where malloc is being called.
Kurt H Maier
2012-11-03 18:06:53 UTC
Permalink
however, i think that queuing theory in general says that one queue with global
sorting beats n smaller queues with local sorting. i think this is sometimes called the
checkout-line problem.
This would be true, I think, if the "big queue" had sufficient
visibility into the guests. They almost never do.
other factors, like global knowledge of memory use stats and page duplication
should put the vm in an even better position than general queueing theory
would suggest to make decisions on what pages to move to disk wrt. global
(that is total machine) throughput.
Page dedup, at least on linux, is not particularly useful for
virtualization yet. It can be made to work but I've never seen people
benefit from it unless the guest operating systems are also linux
systems.
do you have a reference that demonstrates or derives that a similarly-loaded
machine can perform better with all the guests swapping indepdently and
the vm not swapping, rather than preventing the guests from swapping and
letting the vm swap?
I do not have any formal data on the subject; only the things I've seen
"in the field" as it were. 9cloud does not oversubscribe or page out
VMs unless the system is in danger of crashing. vm.overcommit_memory is
set to 2 and swap is there in case a system process spirals out of
control. The systems are reasonably performant. I manage the
hypervisor in this style based on years of seeing other options burn me.
erik quanstrom
2012-11-03 17:57:10 UTC
Permalink
Post by Kurt H Maier
This makes more sense. However, if your hypervisor is swapping, you've
screwed up your planning. RAM oversubscription is the reason most
dime-store VPS services suck really badly. I leave swapping to the
guest OS, since that's where malloc is being called.
i'm not a big vm guy, but i do know that oversubscription is a big problem.

however, i think that queuing theory in general says that one queue with global
sorting beats n smaller queues with local sorting. i think this is sometimes called the
checkout-line problem.

other factors, like global knowledge of memory use stats and page duplication
should put the vm in an even better position than general queueing theory
would suggest to make decisions on what pages to move to disk wrt. global
(that is total machine) throughput.

do you have a reference that demonstrates or derives that a similarly-loaded
machine can perform better with all the guests swapping indepdently and
the vm not swapping, rather than preventing the guests from swapping and
letting the vm swap?

- erik
Charles Forsyth
2012-11-03 18:48:47 UTC
Permalink
local paging algorithms can avoid thrashing: "the process pages against
itself".
global paging algorithms typically do not (invariably do not, in my
experience, but most people use essentially the same one, so there might be
some that worked).

Wilkes has a nice discussion of paging algorithms as an application of
control theory
in "The Dynamics of Paging".
http://comjnl.oxfordjournals.org/content/16/1/4.short

"It is notorious that the use of apparently innocuous scheduling and paging
algorithms can give rise to the type of unstable behaviour known as
thrashing."
other factors, like global knowledge of memory use stats and page duplication
should put the vm in an even better position than general queueing theory
would suggest to make decisions on what pages to move to disk wrt. global
(that is total machine) throughput.
erik quanstrom
2012-11-04 14:02:56 UTC
Permalink
Post by Charles Forsyth
local paging algorithms can avoid thrashing: "the process pages
against itself". global paging algorithms typically do not
(invariably do not, in my experience, but most people use essentially
the same one, so there might be some that worked).
Wilkes has a nice discussion of paging algorithms as an application of
control theory in "The Dynamics of Paging".
http://comjnl.oxfordjournals.org/content/16/1/4.short
"It is notorious that the use of apparently innocuous scheduling and
paging algorithms can give rise to the type of unstable behaviour
known as thrashing."
good point.

however, if running 10 copies of the same os install, a common occurance,
and the hypervisor is consolidating identical pages (usually using cas), the
pages an os is likely to free are likely to be duplicated. unless they all page
out that page, no memory is saved unless the hypervisor swaps it out.

vmware makes the same point www.vmware.com/files/pdf/mem_mgmt_perf_vsphere5.pdf
p. 13. but they also make the case that hat the balloon technique can
outpreform host swapping, p. 20, fig. 12.

(vdi (virtual desktop) would be interesting to graph. couldn't find that.)

p. 27 best practices, bullet 3 basically says, make sure you have have enough
memory because paging sucks. :-)

- erik

p.s. sharepoint takes 5 vms to run? really guys?
hiro
2012-11-04 14:18:19 UTC
Permalink
that paging sucks is not a very new discovery.
Charles Forsyth
2012-11-04 14:58:32 UTC
Permalink
It needn't. Today, with the price of RAM, as many have noted, it's probably
not worth the code and complexity
(though there needn't be much of either), but it can be designed and made
to work well.

Also, despite the low cost of RAM, the systems still run out: my 3 Gbyte
notebook running Chrome under Linux
can run out. What Linux then does with no paging file is not pleasant, so I
give it a small one, and it still thrashes
trying to reclaim enough memory to continue. Pathetic.

My new 4Gb notebook has the memory soldered in (Samsung Series 9).
Post by hiro
that paging sucks is not a very new discovery.
Bakul Shah
2012-11-04 18:57:16 UTC
Permalink
What is really needed is to have some of your hotshot programmers develop on 3+ year old "average" computers. Their code bloat will reduce when their own productivity suffers due to slow machines.

Still there will be cases where a lean program will run out of memory. It should do its own memory management. For instance, don't try to map in multi GB files in their entirety if you are justvstreaming data!
It needn't. Today, with the price of RAM, as many have noted, it's probably not worth the code and complexity
(though there needn't be much of either), but it can be designed and made to work well.
Also, despite the low cost of RAM, the systems still run out: my 3 Gbyte notebook running Chrome under Linux
can run out. What Linux then does with no paging file is not pleasant, so I give it a small one, and it still thrashes
trying to reclaim enough memory to continue. Pathetic.
My new 4Gb notebook has the memory soldered in (Samsung Series 9).
that paging sucks is not a very new discovery.
Charles Forsyth
2012-11-04 14:54:43 UTC
Permalink
Except that it's only really free when they all DO page it out, because
otherwise it's in someone's working set, and should remain.
Post by erik quanstrom
unless they all page
out that page, no memory is saved unless the hypervisor swaps it out.
Charles Forsyth
2012-11-04 16:18:26 UTC
Permalink
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.51.1897
post a pointer to the work you did applying the EMAS-style paging
behaviour to Unix.
Martin Harriss
2012-11-04 16:14:13 UTC
Permalink
Post by Charles Forsyth
local paging algorithms can avoid thrashing: "the process pages against
itself".
global paging algorithms typically do not (invariably do not, in my
experience, but most people use essentially the same one, so there might
be some that worked).
Wilkes has a nice discussion of paging algorithms as an application of
control theory
in "The Dynamics of Paging".
http://comjnl.oxfordjournals.org/content/16/1/4.short
"It is notorious that the use of apparently innocuous scheduling and
paging algorithms can give rise to the type of unstable behaviour known
as thrashing."
Charles,

Regarding local paging algorithms, perhaps at this juncture you should
give yourself a pat on the back and post a pointer to the work you did
applying the EMAS-style paging behaviour to Unix.

(The EMAS papers are a joy to read, even 40 years on. I'd dig up a
reference but here on the east coast of the US I'm in my 6th day without
power and have other things to worry about right now.)

Martin
t***@polynum.com
2012-11-04 16:27:58 UTC
Permalink
Post by Charles Forsyth
Wilkes has a nice discussion of paging algorithms as an application of
control theory
in "The Dynamics of Paging".
http://comjnl.oxfordjournals.org/content/16/1/4.short
"It is notorious that the use of apparently innocuous scheduling and paging
algorithms can give rise to the type of unstable behaviour known as
thrashing."
Just for the (historical) record, the original G.R.A.S.S. team, since
the processing i.e. some kind of sorting of huge data typically raster
may need a lot of memory, went as far as implementing a library in
G.R.A.S.S. to do user level paging and swapping (indeed segmentation and
use of disks but actually file system use to store segments of
processing---the segment library).

Has a "paging / swapping" filesystem (non persistent data, processes dependant
timelife "memory" allocation, with storing/reloading to/from disk, and
use of real memory when available) been attempted?
--
Thierry Laronde <tlaronde +AT+ polynum +dot+ com>
http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C
t***@polynum.com
2012-11-04 19:17:25 UTC
Permalink
Post by t***@polynum.com
Has a "paging / swapping" filesystem (non persistent data, processes dependant
timelife "memory" allocation, with storing/reloading to/from disk, and
use of real memory when available) been attempted?
Just to add what would be "not existing":

- user level
- IPC since these memory blocks would be named "files".

And as far as the G.R.A.S.S (geographical data processing) was involved,
one could really need lots of "memory"; far more than what was available
(end of eighties); and with increasing data size, problem is still here
(well, with geographical data, one sorting is obvious: by location hence
tiling; problem starts when one has to consider how to weave tile
results, if the result of the whole may depend on non local (twice the
size of the tile) interdependencies).
--
Thierry Laronde <tlaronde +AT+ polynum +dot+ com>
http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C
hiro
2012-11-03 21:15:23 UTC
Permalink
Post by erik quanstrom
Post by Kurt H Maier
Are you deliberately conflating swapping and paging?
in modern systems, i believe they mean the same thing.
http://en.wikipedia.org/wiki/Paging#Terminology
Post by Kurt H Maier
memory deduplication? is that true?
http://lwn.net/Articles/454795/
most things I learn about on 9fans (and why I'm happily still
subscribed) is about what "technology" to avoid.

"They used Word to write the spec"
hiro
2012-11-03 16:46:58 UTC
Permalink
memory deduplication? is that true?
Kurt H Maier
2012-11-03 16:31:03 UTC
Permalink
Post by erik quanstrom
does plan 9 run on any ultrabooks natively? swapping within a vm?
my head hurts to think of it.
Your head hurts to think that sometimes extra memory is needed? On the
VMs I host for people, I partition 256 MB of memory. This is not enough
to compile python, so you turn on swap before mk, then turn it off again
when you're done. Or not. I'm not a cop.

Either way, nobody has died yet, or even complained about headaches.
balaji.srinivasa+ (balaji)
2012-11-03 16:55:33 UTC
Permalink
Post by erik quanstrom
does plan 9 run on any ultrabooks natively? swapping within a vm?
my head hurts to think of it.
Your head hurts to think that sometimes extra memory is needed? On the
VMs I host for people, I partition 256 MB of memory. This is not enough
to compile python, so you turn on swap before mk, then turn it off again
when you're done. Or not. I'm not a cop.
Either way, nobody has died yet, or even complained about headaches.
This is a good case for swap and i +1 charles's suggestion.
Matthew Veety
2012-11-03 17:05:16 UTC
Permalink
Post by erik quanstrom
does plan 9 run on any ultrabooks natively? swapping within a vm?
my head hurts to think of it.
- erik
I have 9front running natively with some help on my MacBook Air. Ive been working on getting rid of the help.

--
Veety
Pavel Klinkovsky
2012-11-05 13:33:49 UTC
Permalink
there are good reasons for dropping swap all together -
it is very slow, rarely used, and ram is cheap these days.
It seems true.

But in my case, it is easier, faster and cheaper to reinstall Plan 9 with larger (if working) swap partition than increase the RAM size.
However it is just a theory since official Plan 9 does not have a working swap...

Pavel
erik quanstrom
2012-11-05 13:52:19 UTC
Permalink
Post by Pavel Klinkovsky
It seems true.
But in my case, it is easier, faster and cheaper to reinstall Plan 9 with larger (if working) swap partition than increase the RAM size.
However it is just a theory since official Plan 9 does not have a working swap...
it would be far easier for you to just remove the *experimental* go package.

- erik
Pavel Klinkovsky
2012-11-05 15:56:15 UTC
Permalink
Post by erik quanstrom
Post by Pavel Klinkovsky
But in my case, it is easier, faster and cheaper to reinstall Plan 9 with larger (if working) swap partition than increase the RAM size.
it would be far easier for you to just remove the *experimental* go package.
Already done, and my work continues... ;)

Pavel

c***@gmx.de
2012-11-05 13:53:21 UTC
Permalink
you dont need a dedicated swap partition. just swap
to a /tmp file.

--
cinap
Pavel Klinkovsky
2012-11-05 15:56:45 UTC
Permalink
you dont need a dedicated swap partition. just swap to a /tmp file.
Oh, yes.
I forgot such a possibility.

Pavel
steve
2012-11-02 16:44:34 UTC
Permalink
i believe 8l uses its own storage allocator, but coul easily be tweeked to use malloc and free
by hacking the sources a little (global search and replace), if memory serves me well this
trades memory use for speed, and by going back to traditional malloc in go's 8l you could
reduce the ram use but make it quite a bit slower.

-Steve
Post by erik quanstrom
Post by Pavel Klinkovsky
Post by Anthony Martin
How much memory does your system have?
- 512 MB RAM
- 512 MB swap
Post by Anthony Martin
Changeset 14739 grew the Unicode collation tables
in the exp/locale/collate package by a considerable
amount. The compiler's memory usage now goes above
400 MB when building that package, almost 2.5x the
amount used to compile the second heavyweight and
15x the average.
I see.
- full RAM
- small portion of swap occupied
i might give the 9front kernel a go. i think that cinap spent
some time trying to make swap work a little bit.
i'd wonder though if there were some way to cut down the module
so it doesn't take quite so much memory. even halving it would
mean you could ditch swap.
- erik
Loading...