[9fans] comparisons with NaN

Discussion:

(too old to reply)

Richard Miller

2013-08-21 13:17:25 UTC

The Plan 9 C compilers do not appear to be compliant with the IEEE floating
point standard when making comparisons with NaN (not a number) values.

The standard says a comparison with one or both operands NaN is "unordered",
ie all relations evaluate to false, except != which is always true.

Testing with this fragment of code:
double a, b;
setfcr(0);
a = 0.0;
b = sqrt(-1.0);
if(a < b) print(" (a < b)");
if(a <= b) print(" (a <= b)");
if(a == b) print(" (a == b)");
if(a != b) print(" (a != b)");
if(a >= b) print(" (a >= b)");
if(a > b) print(" (a > b)");
if(b < a) print(" (b < a)");
if(b <= a) print(" (b <= a)");
if(b == a) print(" (b == a)");
if(b != a) print(" (b != a)");
if(b >= a) print(" (b >= a)");
if(b > a) print(" (b > a)");
print("\n");
on ARM the result is almost completely wrong:
(a < b) (a <= b) (a != b) (b < a) (b <= a) (b != a)
and on x86 the result is even wronger:
(a < b) (a <= b) (a == b) (b < a) (b <= a) (b == a)
compared to the IEEE expected result, for example on MacOS:
(a != b) (b != a)

This was discovered by fgb; I've been looking into the cause -- which is
mainly the assumption, in the compiler and linker, that something like this:
if (a < b) f();
can safely be transformed to this:
if (a >= b) goto skip;
f();
skip:
Unfortunately if a or b is NaN, the conditional will be false in both cases.

So is this a feature, or a bug that needs fixing?

erik quanstrom

2013-08-21 14:34:38 UTC

Permalink

amd64 does yet something else.

amd64 (a == b) (a >= b) (a > b) (b == a) (b >= a) (b > a)
386 (a < b) (a <= b) (a == b) (b < a) (b <= a) (b == a)
arm (a < b) (a <= b) (a != b) (b < a) (b <= a) (b != a)
mips (a < b) (a <= b) (a != b) (b < a) (b <= a) (b != a)

Post by Richard Miller
if (a < b) f();
if (a >= b) goto skip;
f();
Unfortunately if a or b is NaN, the conditional will be false in both cases.
So is this a feature, or a bug that needs fixing?

how about another option, just a bug.

there are other issues with the floating point, including
the fact that -0.0 is transformed both by the compiler, and
by print(2) to 0.0. ape's printf prints -0.0 correctly.

at least in terms of passing floating point test suites
(like python's) the NaN issue doesn't come up, but the
-0 issue breaks a number of tests.

- erik

Richard Miller

2013-08-21 16:08:25 UTC

Permalink

Post by erik quanstrom
at least in terms of passing floating point test suites
(like python's) the NaN issue doesn't come up

Actually it was a test suite that revealed the NaN errors.
I wouldn't think it's something anyone needs in normal
day-to-day computation, but sometimes boxes must be ticked.

erik quanstrom

2013-08-21 16:55:31 UTC

Permalink

Post by Richard Miller

Post by erik quanstrom
at least in terms of passing floating point test suites
(like python's) the NaN issue doesn't come up

Actually it was a test suite that revealed the NaN errors.
I wouldn't think it's something anyone needs in normal
day-to-day computation, but sometimes boxes must be ticked.

:-) it is hard to imagine how this is useful. it's not like
∑{i→∞}-0 is interesting. at least ∏{i→∞}-0 has an alternating
sign. (so does it converge with no limit?)

the difference i have seen is a situation like
atan2(-0, x) ≡ -π
atan2(+0, x) ≡ pi, ∀ x<0.

any ideas on how this is useful?

- erik

erik quanstrom

2013-08-21 17:47:56 UTC

Permalink

Post by erik quanstrom

Post by Richard Miller

Post by erik quanstrom
at least in terms of passing floating point test suites
(like python's) the NaN issue doesn't come up

Actually it was a test suite that revealed the NaN errors.
I wouldn't think it's something anyone needs in normal
day-to-day computation, but sometimes boxes must be ticked.

:-) it is hard to imagine how this is useful. it's not like
∑{i→∞}-0 is interesting. at least ∏{i→∞}-0 has an alternating
sign. (so does it converge with no limit?)
the difference i have seen is a situation like
atan2(-0, x) ≡ -π
atan2(+0, x) ≡ pi, ∀ x<0.
any ideas on how this is useful?

See comments by Stephen Canon in
http://stackoverflow.com/questions/1565164/what-is-the-rationale-for-all-comparisons-returning-false-for-ieee754-nan-values

i think you selected a different antecedent for "this" than
i did. by "this" i ment to refer to -0.

- erik

Bakul Shah

2013-08-21 18:00:36 UTC

Permalink

Post by erik quanstrom

Post by Richard Miller
Actually it was a test suite that revealed the NaN errors.
I wouldn't think it's something anyone needs in normal
day-to-day computation, but sometimes boxes must be ticked.

:-) it is hard to imagine how this is useful.

See comments by Stephen Canon in
http://stackoverflow.com/questions/1565164/what-is-the-rationale-for-all-comparisons-returning-false-for-ieee754-nan-values

That this!

Richard Miller

2013-08-21 18:00:58 UTC

Permalink

Post by erik quanstrom
by "this" i ment to refer to -0.

But the subject line says "comparisons with NaN". Start another
thread about signed zero if you like. (I'm not facing a test
suite objecting to those at the moment.)

Charles Forsyth

2013-08-21 18:24:50 UTC

Permalink

I think that if there is a generally-accepted standard for the behaviour of
a language's handling of floating-point numbers,
it would be reasonable to try to follow the standard, unless it's stupid,
ill-advised, or impossible (or all three).
That reply to the Stack Overflow post -- and this might be the first and
last time I can write this -- was, I thought, concise and compelling.

Post by Richard Miller

Post by erik quanstrom
by "this" i ment to refer to -0.

But the subject line says "comparisons with NaN". Start another
thread about signed zero if you like. (I'm not facing a test
suite objecting to those at the moment.)

Richard Miller

2013-08-22 14:05:11 UTC

Permalink

Post by Charles Forsyth
it would be reasonable to try to follow the standard, unless it's stupid,
ill-advised, or impossible (or all three).

Not impossible, maybe a bit tricky to stop the linkers from reordering
things. The cost would be (at least) one extra instruction for each
'if' statement with a floating point inequality and no 'else' clause.

Ron, are you still reading this list? What do your numerical colleagues
think about NaNs?

Charles Forsyth

2013-08-22 14:25:02 UTC

Permalink

Post by Charles Forsyth
it would be reasonable to try to follow the standard, unless it's stupid,

Post by Charles Forsyth
ill-advised, or impossible (or all three).

I was a little ambiguous. I meant that statement in general, but I in the
particular case of floating-point, being fundamental, probably should work
as now defined,
and I didn't think NaNs satisfied the last bit of being stupid, ill-advised
or impossible.

Looking at the code generated, I'd have thought that it was the use of FCOM
instead of FUCOM that mattered,
not the integer unit comparison that's subsequently used.

Bakul Shah

2013-08-21 17:42:57 UTC

Permalink

Post by erik quanstrom

Post by Richard Miller

Post by erik quanstrom
at least in terms of passing floating point test suites
(like python's) the NaN issue doesn't come up

Actually it was a test suite that revealed the NaN errors.
I wouldn't think it's something anyone needs in normal
day-to-day computation, but sometimes boxes must be ticked.

:-) it is hard to imagine how this is useful. it's not like
∑{i→∞}-0 is interesting. at least ∏{i→∞}-0 has an alternating
sign. (so does it converge with no limit?)
the difference i have seen is a situation like
atan2(-0, x) ≡ -π
atan2(+0, x) ≡ pi, ∀ x<0.
any ideas on how this is useful?

See comments by Stephen Canon in
http://stackoverflow.com/questions/1565164/what-is-the-rationale-for-all-comparisons-returning-false-for-ieee754-nan-values

Try this:

#include <u.h>
#include <libc.h>

main(){
double a, b;
setfcr(0);
a = 0.0;
b = a/a;
if(a < b) print(" (a < b)");
if(a <= b) print(" (a <= b)");
if(a == b) print(" (a == b)");
if(a != b) print(" (a != b)");
if(a >= b) print(" (a >= b)");
if(a > b) print(" (a > b)");
if(b < a) print(" (b < a)");
if(b <= a) print(" (b <= a)");
if(b == a) print(" (b == a)");
if(b != a) print(" (b != a)");
if(b >= a) print(" (b >= a)");
if(b > a) print(" (b > a)");
if(b != b) print(" (b != b)");
if(b == b) print(" (b == b)");
print("\n");
return 0;
}

It falsely reports b == b when b is NaN.

erik quanstrom

2013-08-21 14:36:42 UTC

Permalink

Post by erik quanstrom
how about another option, just a bug.

what i mean is, the need for fixing it depends on how much
havoc this issue causes.

- erik

l***@proxima.alt.za

2013-08-21 18:24:03 UTC

Permalink

Post by erik quanstrom
what i mean is, the need for fixing it depends on how much
havoc this issue causes.

Well, there is also the question of whether anything at all will break
if the bug is fixed. If not, then the answer is simple.

++L

erik quanstrom

2013-08-21 18:27:00 UTC

Permalink

Post by l***@proxima.alt.za

Post by erik quanstrom
what i mean is, the need for fixing it depends on how much
havoc this issue causes.

Well, there is also the question of whether anything at all will break
if the bug is fixed. If not, then the answer is simple.

fortunately, since plan 9 traps when a computation produces a NaN by default,
and nothing in /sys/src/cmd calls setfcr(2), i think we can exclude this possiblity.

- erik