erik quanstrom
2012-08-18 20:11:13 UTC
since it came up, i put my working copy of tcp along with some testing
scripts in /n/sources/contrib/quanstro/tcp.
there are a number of fixes rolled into this, but the main fixes are
- add support for new reno,
- properly handle zero-window probes (on both ends),
- don't confuse the cwind with the receiver's advertized window. this
particular condition can lead to livelock.
- don't confuse the window scale with the amount of local buffering
we'd like to do.
- and, don't queue tcp infinitely, which can crash kernels. :-)
i don't have the numbers for the old tcp handy, but i think you'll
be surprised at how much difference there can be. i saw differences
of 20x when the sender was limited in how fast it could read by the
read rate from user space.
i've included "testscript." for the two machines i have handy, i get
the following results with new and old tcp.
machine stack kernel 0ms delay 1ms delay
ideal - 386 unlimited 8.19mb/s
xeon x5550 old 386 138mb/s 0.49mb/s (!)
intel atom old 386 37.2mb/s 0.10mb/s
amd x4 964 new 386 145mb/s 8.03mb/s
intel e31220 new amd64 303mb/s 8.15mb/s
intel atom new 386 67mb/s 8.03mb/s
# note: i can get up to 80mb/s using forsyth's qmalloc.
- erik
scripts in /n/sources/contrib/quanstro/tcp.
there are a number of fixes rolled into this, but the main fixes are
- add support for new reno,
- properly handle zero-window probes (on both ends),
- don't confuse the cwind with the receiver's advertized window. this
particular condition can lead to livelock.
- don't confuse the window scale with the amount of local buffering
we'd like to do.
- and, don't queue tcp infinitely, which can crash kernels. :-)
i don't have the numbers for the old tcp handy, but i think you'll
be surprised at how much difference there can be. i saw differences
of 20x when the sender was limited in how fast it could read by the
read rate from user space.
i've included "testscript." for the two machines i have handy, i get
the following results with new and old tcp.
machine stack kernel 0ms delay 1ms delay
ideal - 386 unlimited 8.19mb/s
xeon x5550 old 386 138mb/s 0.49mb/s (!)
intel atom old 386 37.2mb/s 0.10mb/s
amd x4 964 new 386 145mb/s 8.03mb/s
intel e31220 new amd64 303mb/s 8.15mb/s
intel atom new 386 67mb/s 8.03mb/s
# note: i can get up to 80mb/s using forsyth's qmalloc.
- erik