[9fans] tcp!

Discussion:

[9fans] tcp!

(too old to reply)

erik quanstrom

2012-08-18 20:11:13 UTC

since it came up, i put my working copy of tcp along with some testing
scripts in /n/sources/contrib/quanstro/tcp.

there are a number of fixes rolled into this, but the main fixes are
- add support for new reno,
- properly handle zero-window probes (on both ends),
- don't confuse the cwind with the receiver's advertized window. this
particular condition can lead to livelock.
- don't confuse the window scale with the amount of local buffering
we'd like to do.
- and, don't queue tcp infinitely, which can crash kernels. :-)

i don't have the numbers for the old tcp handy, but i think you'll
be surprised at how much difference there can be. i saw differences
of 20x when the sender was limited in how fast it could read by the
read rate from user space.

i've included "testscript." for the two machines i have handy, i get
the following results with new and old tcp.

machine stack kernel 0ms delay 1ms delay
ideal - 386 unlimited 8.19mb/s

xeon x5550 old 386 138mb/s 0.49mb/s (!)
intel atom old 386 37.2mb/s 0.10mb/s

amd x4 964 new 386 145mb/s 8.03mb/s
intel e31220 new amd64 303mb/s 8.15mb/s
intel atom new 386 67mb/s 8.03mb/s
# note: i can get up to 80mb/s using forsyth's qmalloc.

- erik

erik quanstrom

2012-08-18 20:26:10 UTC

Permalink

Post by erik quanstrom
- add support for new reno,

i apoligize for not mentioning that the new reno work
was part of the nix/9k tcp. i'm not sure who wrote it.

sorry!

also i forgot to mention that this version of qread can
potentially cut the number of reads on tcp channels by up
to 1/2. one might as well completely satisfy the read,
if possible. especially since typical iounits (8192) do not
divide up into typical mss-sized (1460) packets evenly.

[...]
/* if we get here, there's at least one block in the queue */
if(q->state & Qcoalesce){
/* when coalescing, 0 length blocks just go away */
b = q->bfirst;
if(BLEN(b) <= 0){
freeb(qremove(q));
goto again;
}

/*
* grab the first block and as many following
* blocks as will partially fit in the read
*/
n = 0;
l = &first;
for(;;) {
*l = qremove(q);
l = &b->next;
n += BLEN(b);
if(n >= len || (b = q->bfirst) == nil)
break;
}

- erik

Richard Miller

2012-08-19 13:37:01 UTC

Permalink

Post by erik quanstrom
also i forgot to mention that this version of qread can
potentially cut the number of reads on tcp channels by up
to 1/2. one might as well completely satisfy the read,
if possible.

This looks like a good idea for tcp. But there are other
users of qread, with stricter assumptions. Aren't you in danger
of breaking the contract of pipe(3) which uses qwrite/qread:

Writes are atomic up to a certain size, typically 32768
bytes, that is, each write will be delivered in a single
read by the recipient, provided the receiving buffer is
large enough.

To preserve the atomicity of qread/qwrite, maybe tcp should be
coalescing the blocks itself by multiple calls to qread.

c***@gmx.de

2012-08-19 13:55:53 UTC

Permalink

its only done on queues that have this flag set i think:

Qcoalesce = (1<<4), /* coallesce packets on read */

--
cinap

erik quanstrom

2012-08-19 14:48:23 UTC

Permalink

Post by Richard Miller
This looks like a good idea for tcp. But there are other
users of qread, with stricter assumptions. Aren't you in danger
Writes are atomic up to a certain size, typically 32768
bytes, that is, each write will be delivered in a single
read by the recipient, provided the receiving buffer is
large enough.

this change only applies to Qcoalesce queues.

the only users of Qcoalesce are the kprintoq and tcp. both
should be okay with this change.

; g qopen port/devpipe.c
port/devpipe.c:68: p->q[0] = qopen(conf.pipeqsize, 0, 0, 0);
port/devpipe.c:73: p->q[1] = qopen(conf.pipeqsize, 0, 0, 0);

- erik

Richard Miller

2012-08-19 13:17:38 UTC

Permalink

Within the last month or so I've been having trouble copying large
files to remote servers e.g. sources. The cp process hangs for
many minutes and eventually ends in 'mount rpc error'. I was
hoping this tcp patch might solve it, but alas no.

Has anyone else been observing this?

erik quanstrom

2012-08-19 15:43:27 UTC

Permalink

Post by Richard Miller
Within the last month or so I've been having trouble copying large
files to remote servers e.g. sources. The cp process hangs for
many minutes and eventually ends in 'mount rpc error'. I was
hoping this tcp patch might solve it, but alas no.

could you send a snoopy capture? -M100 and just the tail should
be good enough. also a capture of /net/log with 'set tcp' during the
issue could be helpful. also, could you point to a particular large
file on sources? i'd like to try to replicate.

- erik

Richard Miller

2012-08-19 14:05:35 UTC

Permalink

... and it won't be set for pipes, of course. Sorry Erik, I should
have studied this more carefully.

I'll try it.

erik quanstrom

2012-08-19 15:07:43 UTC

Permalink

Post by Richard Miller
... and it won't be set for pipes, of course. Sorry Erik, I should
have studied this more carefully.
I'll try it.

no problems. i'm glad you're double-checking. nobody i know is immune
from error. and there's me, myself and i. so i am 3x as likely to screw up.

i'd be curious to know if this makes a noticable difference on slower machines
like the π with tcptest to self.

- erik

Richard Miller

2012-08-21 18:32:53 UTC

Permalink

Thanks to a hint from Erik ("... an mss problem of some sort"), I've
managed to make the problem go away, by doing
echo mtu 1496 >/net/ipifc/1/ctl

I hope to come back to this when I have more time, because I don't
like not understanding why this works. As nobody else has said they
have the same trouble, there may be something amiss in my adsl gateway.

Gorka Guardiola

2012-08-22 17:18:05 UTC

Permalink

I had this problem several years ago
with an adsl router (9fans archive may know about this). There was a bug in my adsl router (which seems to be common, I have seen it since more than once) that dropped ethernet frames of size greater than 1480 (someone counted a header twice probably). Linux adapts the
mss to 1480 if there are problems so it works in this case.

G.

Post by Richard Miller

Thanks to a hint from Erik ("... an mss problem of some sort"), I've
managed to make the problem go away, by doing
echo mtu 1496 >/net/ipifc/1/ctl
I hope to come back to this when I have more time, because I don't
like not understanding why this works. As nobody else has said they
have the same trouble, there may be something amiss in my adsl gateway.

Steven Stallion

2012-08-22 18:29:51 UTC

Permalink

Post by Gorka Guardiola
I had this problem several years ago
with an adsl router (9fans archive may know about this). There was a bug in my adsl router (which seems to be common, I have seen it since more than once) that dropped ethernet frames of size greater than 1480 (someone counted a header twice probably). Linux adapts the
mss to 1480 if there are problems so it works in this case.

Not so much a bug as ATM overhead.

erik quanstrom

2012-08-22 18:46:23 UTC

Permalink

Post by Steven Stallion

I had this problem several years ago with an adsl router (9fans
archive may know about this). There was a bug in my adsl router
(which seems to be common, I have seen it since more than once) that
dropped ethernet frames of size greater than 1480 (someone counted a
header twice probably). Linux adapts the mss to 1480 if there are
problems so it works in this case.

Not so much a bug as ATM overhead.

atm overhead is 5 bytes per 48 bytes transmitted.

the original problem is a limit of 1496 bytes,
not 1460, which is more constent with mpls
than l2tp (1460) or pppoe (1492). but all that's
guesswork.

the "bug" here, if there is one, is that there's
neither an icmp message nor fragmentation
nor mss rewriting at the local gateway, which
should (since it's eithernet) not silently drop
mtu-sized frames that it's responsible for
gatewaying.

- erik