Discussion:
[9fans] Go and 21-bit runes (and a bit of Go status)
(too old to reply)
l***@proxima.alt.za
2013-06-02 10:53:37 UTC
Permalink
Just a heads-up to 9fans that the Go build for Plan 9 (386 or ARM) now
expects the underlying platform to be updated with the 21-bit runes
fixes from Bell Labs, the pertinent submission has been accepted and
processed.

Being up to date may not improve the build as much as one may wish,
but not being up to date is guaranteed to be a problem.

Anthony Martin is also in the process of getting another important
patch to the Go distribution approved, while there may be delays
figuring out what to do about the SSE2 extension to the Intel 386
architecture.

Regarding the latter, Plan 9 does not allow floating point
instructions to be executed within note handling, but erring on the
side of caution also forbids instructions such as MOVOU (don't ask me)
which is part of the SSE(2?) extension, but hardly qualifies as a
floating point instruction.

I have yet to see the type of suggestion for Go, but specially for
Plan 9, that resolves all future conflicts on this score: it is a
difficult bit of architectural design on both sides. I, for one,
would be interested in plausible suggestions. The 64-bit people may
need them even more, or maybe not at all.

++L
erik quanstrom
2013-06-02 14:10:00 UTC
Permalink
Post by l***@proxima.alt.za
Regarding the latter, Plan 9 does not allow floating point
instructions to be executed within note handling, but erring on the
side of caution also forbids instructions such as MOVOU (don't ask me)
which is part of the SSE(2?) extension, but hardly qualifies as a
floating point instruction.
movou (movdqu in the manual) is a sse2 data movement instruction.
not all sse2 instructions require that sse be turned on (pause, for example),
but movou uses at least one xmm register so is clearly using the sse
unit, thus requiring that it be turned on.

the go runtime memmove uses movou for memmoves between 33 and 128
bytes. i only see a 10 cycle difference for these cases on my atom machine,
(maximum 13%), so we're not missing out on much here by not using sse.

the real win, or loss for the plan 9 memmove, is in the short memmoves.
but this is a µbenchmark, and it would be more convincing with a real
world test.

- erik

harness; 8.memmovetest
memmove
1 92.42578 cycles/op
2 81.28125 cycles/op
4 56.47266 cycles/op
8 58.32422 cycles/op
16 62.28516 cycles/op
32 70.26563 cycles/op
64 86.32031 cycles/op
128 118.3125 cycles/op
512 323.5078 cycles/op
1024 587.1094 cycles/op
4096 2119.242 cycles/op
131072 133058.5 cycles/op

rt·memmove
1 20.60156 cycles/op
2 20.34375 cycles/op
4 24.46875 cycles/op
8 22.42969 cycles/op
16 27.45703 cycles/op
32 52.82813 cycles/op
64 79.19531 cycles/op
128 129.1289 cycles/op
512 314.4492 cycles/op
1024 569.9648 cycles/op
4096 2132.297 cycles/op
131072 135378.3 cycles/op
l***@proxima.alt.za
2013-06-02 15:24:30 UTC
Permalink
Post by erik quanstrom
movou (movdqu in the manual) is a sse2 data movement instruction.
not all sse2 instructions require that sse be turned on (pause, for example),
but movou uses at least one xmm register so is clearly using the sse
unit, thus requiring that it be turned on.
I see Erik answers my question: xmm registers may be clobbered. I
suppose they could be saved in the Go runtime, if absolutely
essential?

++L
erik quanstrom
2013-06-03 04:20:36 UTC
Permalink
Post by l***@proxima.alt.za
I see Erik answers my question: xmm registers may be clobbered. I
suppose they could be saved in the Go runtime, if absolutely
essential?
no, they can not. saving registers is something that is done on
context switch by the scheduler, and the go runtime is not
involved in context switching; this is a user-level transparent thing.

there are things that could be done. but before getting radical in a
hurry, is there any place other than runtime·memmove() that
would use sse in a note handler?

- erik
l***@proxima.alt.za
2013-06-03 05:38:47 UTC
Permalink
Post by erik quanstrom
there are things that could be done. but before getting radical in a
hurry, is there any place other than runtime·memmove() that
would use sse in a note handler?
I presumed, perhaps incorrectly, that users are allowed to write their
own signal handlers and are not prohibited from using floating point
instructions in them.

++L
erik quanstrom
2013-06-03 13:28:07 UTC
Permalink
Post by l***@proxima.alt.za
Post by erik quanstrom
there are things that could be done. but before getting radical in a
hurry, is there any place other than runtime·memmove() that
would use sse in a note handler?
I presumed, perhaps incorrectly, that users are allowed to write their
own signal handlers and are not prohibited from using floating point
instructions in them.
there are no signals in plan 9. i assume that you mean note handlers.

note handlers may be user set but may not do floating point.
see notify(2) for documentation.

as cinap pointed out, this is because the fp save area for the process
is (potentially) busy when the note is delivered. think of it this way
here up = user process. this is the kernel convention. some details
of this are slightly incorrect for clarity.

user kernel
generates fault
saves up registers,
saves up floating point, sets fp state to FPillegal
the save area is already used.
starts note handler
generates fp fault
saves up registers,
finds that we're in FPillegal state, and process
becomes Broken

- erik
l***@proxima.alt.za
2013-06-03 16:34:47 UTC
Permalink
Post by erik quanstrom
starts note handler
generates fp fault
saves up registers,
finds that we're in FPillegal state, and process
becomes Broken
That conflicts with Go's intent, I've no idea how one addresses it.
Ideally, the kernel should trigger a condition that the Go runtime can
deal with. I can see why it is unlikely.

++L
erik quanstrom
2013-06-03 16:46:10 UTC
Permalink
Post by l***@proxima.alt.za
Post by erik quanstrom
starts note handler
generates fp fault
saves up registers,
finds that we're in FPillegal state, and process
becomes Broken
That conflicts with Go's intent, I've no idea how one addresses it.
Ideally, the kernel should trigger a condition that the Go runtime can
deal with. I can see why it is unlikely.
if by intent, you mean that go is using xmm registers as if they were
general purpose registers, then the solution is to stop doing that.
and there's such a patch already.

- erik
l***@proxima.alt.za
2013-06-03 17:04:30 UTC
Permalink
Post by erik quanstrom
if by intent, you mean that go is using xmm registers as if they were
general purpose registers, then the solution is to stop doing that.
and there's such a patch already.
No, Go's intent is to minimise runtime surprises. It is possible to
define signal (calling them notes does not change their nature)
handlers and nothing in the Go specifications compels the user not to
use floating point instructions in such handlers. It would also not
be possible to enforce such restrictions in known implementations of
Go and that creates a conflict. Unless I'm straying into areas I am
ignorant of, something needs to be done for Go and Plan 9 to coexist
amicably, the current conditions are prone to Go-incompatible
behaviour.

Solving the memmove() problem just puts the crisis off for a while.
Once Go is being used on Plan 9 and especially if a user expects
portability from more common platforms, this particular
incompatibility is likely to hurt.

++L
erik quanstrom
2013-06-03 17:07:13 UTC
Permalink
Post by l***@proxima.alt.za
Post by erik quanstrom
if by intent, you mean that go is using xmm registers as if they were
general purpose registers, then the solution is to stop doing that.
and there's such a patch already.
No, Go's intent is to minimise runtime surprises. It is possible to
define signal (calling them notes does not change their nature)
handlers and nothing in the Go specifications compels the user not to
use floating point instructions in such handlers. It would also not
be possible to enforce such restrictions in known implementations of
signals are not compatable with notes. i don't think this
can be truely portable code anyway.

- erik
Bakul Shah
2013-06-03 17:33:21 UTC
Permalink
Post by erik quanstrom
Post by l***@proxima.alt.za
Post by erik quanstrom
if by intent, you mean that go is using xmm registers as if they were
general purpose registers, then the solution is to stop doing that.
and there's such a patch already.
No, Go's intent is to minimise runtime surprises. It is possible to
define signal (calling them notes does not change their nature)
handlers and nothing in the Go specifications compels the user not to
use floating point instructions in such handlers. It would also not
be possible to enforce such restrictions in known implementations of
signals are not compatable with notes. i don't think this
can be truely portable code anyway.
Not compatible but signals have similar restrictions. A signal may be delivered at any time where any state maintained in usercode may be inconsistent. In particular use of any non-reentrant function can cause trouble. Used to be, you don't use floating pt. in signal handlers as that would require the kernel to save more state, slowing down signal delivery, or it could cause another trap where the kernel can do the lazy saving trick. Most mallocs are non reentrant as well and you shouldn't use malloc in a handler. All in all a very restricted environment.
Charles Forsyth
2013-06-03 17:38:28 UTC
Permalink
Post by l***@proxima.alt.za
No, Go's intent is to minimise runtime surprises.
It's not a runtime surprise to Go programmers, since no ordinary Go code
runs in that note handler.
There's a rather elaborate implementation to convert notes (or signals)
into something acceptable
to the rest of the Go runtime. It does as little as it can.
l***@proxima.alt.za
2013-06-03 05:48:30 UTC
Permalink
I have applied Anthony's CL 9796043 together with some tweaks to
pkg/runtime/sys_plan9_386.s which I will pass on to Anthony as soon as
I can; this has made it possible to complete the first set of run.rc
tests without the major incidents I used to see. Some tests still
fail, but I wasn't expecting miracles: I believe Anthony (and maybe
others) are still working on changes I am not familiar with.

Still, I think I can report some progress, specially towards being
able to run a Go builder for the Plan 9/386 platform. What we are
going to do about builders for the various offshoots I can't tell.

++L
Steve Simon
2013-06-03 17:53:12 UTC
Permalink
Linuxemu runs all its linux api emulation inside a note handler,
now it does no specific SSE code (its the linux SSE code that was
troublesome here), but if there are any sse library functions in
plan9 (like memmove) then linuxemu may run them from a note handler.

Just an opinion from a different side.

-Steve

c***@gmx.de
2013-06-02 15:01:15 UTC
Permalink
Post by l***@proxima.alt.za
Regarding the latter, Plan 9 does not allow floating point
instructions to be executed within note handling, but erring on the
side of caution also forbids instructions such as MOVOU (don't ask me)
which is part of the SSE(2?) extension, but hardly qualifies as a
floating point instruction.
The reason for FP being forbidden in note handler is that the kernel only saves
the general purpose (Ureg) registers of the interrupted/notified process
context. The fp or xmm registers are *not* saved and a note handler modifying
those (thru fp instructions or sse instructions) would trash these registers
for the program interrupted by the note.

you could save the ureg, and jump out of the note handler with notejmp(),
save the fp/sse registers yourself and then do the handling of the note
outside of the note context. (this is how signals are implemented in ape).

or we change the kernel to save the fp registers in notify() as well,
pushing them on the user stack and restoring them on noted() just like
the Ureg.

or GO could just stop using *OMG-OPTIMIZED* SSE memmove() in the note
handler.

--
cinap
l***@proxima.alt.za
2013-06-02 15:22:31 UTC
Permalink
Post by c***@gmx.de
or GO could just stop using *OMG-OPTIMIZED* SSE memmove() in the note
handler.
But it would not stop users from doing so, so at minimum we'd have to
detect the abuse and report it, rather than crash.

Saving the entire register space would be expensive for all
well-behaved processes and avoiding the micro-optimisation in
memmove() (a Plan 9-specific option) would be my recommendation. But
dragons do lurk and will need to be slain.

Incidentally, do the FP registers need to be saved for the sake of
MOVOUs? Or should I ask whether MOVOUs clobber registers not saved
before note handling?

++L
c***@gmx.de
2013-06-02 15:38:19 UTC
Permalink
the saving isnt the problem. the kernel already flushes the fp registers
to the process fpsave area on notify. its just that we do *not* copy
the registers to the user stack, but save them in the process fpsave
area.

as theres just just one fpsave area in the process, and not one for
notes and one for normal code, note handler is forbitten to use fp again.

its not for the sake of movou. its for the sake of the process interrupted
by the note.

say, you have a programm that gets just interrupted by note while in
that omgoptimized sse memmove() where it just loaded some chunks into
XMM0 register, and then the note fires.

then the note handler does memmove itself modifying XMM0 itself loading
it with something completely different. then note handler finishes
continuing the original programm, then XMM0 would contain the garbage
from the note handler! it would look for the program like if registers
randomly change under it!

--
cinap
l***@proxima.alt.za
2013-06-02 15:54:14 UTC
Permalink
Post by c***@gmx.de
then the note handler does memmove itself modifying XMM0 itself loading
it with something completely different. then note handler finishes
continuing the original programm, then XMM0 would contain the garbage
from the note handler! it would look for the program like if registers
randomly change under it!
True enough. If memmove() were the only problem, solving it would be
easy. One option: drop MOVOU altogether; another option: save xmm8
(following from what you said). But it's the whole FP edifice that's
relevant in the bigger picture: it may be bad practice, but what if I
want to compute the next iteration for pi in a note handler? How is
the Go runtime going to stop me, or at least make sure I am aware that
I should not be doing it rather than give me an incorrect answer that
I then use to fire a ballistic missile at the wrong target?

(I concede that I have not thought about this much - feel free to
think you have to explain this to an idiot.)

++L
Kurt H Maier
2013-06-02 15:59:28 UTC
Permalink
Post by l***@proxima.alt.za
I should not be doing it rather than give me an incorrect answer that
I then use to fire a ballistic missile at the wrong target?
I knew Google was up to something.

khm
l***@proxima.alt.za
2013-06-02 16:08:45 UTC
Permalink
Post by Kurt H Maier
I knew Google was up to something.
Google? Who's Google?

++L
Anthony Martin
2013-06-02 19:37:44 UTC
Permalink
Post by c***@gmx.de
or GO could just stop using *OMG-OPTIMIZED* SSE memmove()
in the note handler.
This is exactly what I did in my patch. This was just
a regression. Someone changed memmove a few weeks ago.

Nothing to see here.

Anthony
Loading...