Discussion:
[9fans] Go testing and resource exhaustion
(too old to reply)
Skip Tavakkolian
2013-05-24 21:57:56 UTC
Permalink
This is just an FYI.

I'm seeing something that looks to me like a resource exhaustion of some
sort. When running the standard test (i.e. go test std), some tests are
broken (see below); but test of each "broken" package/cmd/etc, passes
correctly.

I thought it might be related to semaphores, but the attached program
didn't fail at all; in fact running 100 simultaneously caused nothing more
than several temporary "Fault" states, but none broken.

partial output of ps:

fst 83379 0:00 0:00 240K Await run.rc -e ./run.rc
--no-rebuild
fst 83385 0:00 0:00 48K Await time go test std
-short -timeout 120s
fst 83386 0:04 0:07 797488K Semacqui go test std -short
-timeout 120s
fst 83387 0:00 0:00 797488K Semacqui go
fst 83388 0:00 0:00 797488K Tsemacqu go
fst 83389 0:00 0:00 797488K Semacqui go
fst 83399 0:00 0:00 797488K Pread go
fst 83401 0:00 0:00 797488K Pread go
fst 83403 0:00 0:01 797488K Semacqui go
fst 83404 0:00 0:00 797488K Pread go
fst 83405 0:00 0:00 797488K Semacqui go
fst 83406 0:00 0:01 797488K Pread go
fst 83407 0:00 0:00 797488K Semacqui go
fst 83426 0:00 0:00 797488K Semacqui go
fst 83437 0:00 0:00 794320K Sleep api.test
fst 83453 0:00 0:00 794528K Broken tar.test
fst 83497 0:00 0:00 793628K Sleep bzip2.test
fst 83499 0:00 0:00 793628K Broken bzip2.test
fst 83532 0:00 0:00 794328K Sleep lzw.test
fst 83533 0:00 0:00 794328K Broken lzw.test
fst 84100 0:00 0:00 795204K Sleep template.test
fst 84101 0:00 0:00 795204K Semacqui template.test
fst 84102 0:00 0:00 795204K Broken template.test
l***@proxima.alt.za
2013-05-25 05:14:58 UTC
Permalink
Post by Skip Tavakkolian
I'm seeing something that looks to me like a resource exhaustion of some
sort. When running the standard test (i.e. go test std), some tests are
broken (see below); but test of each "broken" package/cmd/etc, passes
correctly.
I would agree with you, that is the impression I get as well, but the
little monitoring I've applied as well as the fact that I have three
very different hosts right here all exhibiting analogous but distinct
failures makes me think of time/performance related crashes.

On the other hand, cinap may be onto something: if we have a floating
point failure followed by a timer signal, all hell is perfectly
entitled to break loose. But floating point isn't always involved.

We'll get to the bottom of this, eventually, we just need as many
informative data points as possible.

++L

Loading...