Cavaet I: I'm not a network engineer, but I've been known to have to play the role from time to time when people who should know what's going on don't.
Cavaet II: This is going to get technical in places; skip the bits you don't understand.
Let's start from the beginning. WoW exhibits excessive latency when you're playing in a location that's at a large distance from the WoW servers (particularly noticeable in Oceania, hence my personal interest). This has been traced down to the following behaviour: a packet arrives from the wow server, and an acknowledgement is generated by the client machine, as in the following example.
11:28:00.686393 IP wow.server.3724 > wow.client.1230: P 14746:14824(78) ack 162 win 14600
11:28:00.857213 IP wow.client.1230 > wow.server.3724: . ack 14824 win 17520
In this particular example, there's a ~170ms delay in the acknowledgement from the client. It is assumed that this is causing the problem (since removing the delay causes the problem to decrease).
Referencing from rfc1122: From rfc896, describing Nagle's algorithm:
|
The solution is to inhibit the sending of new TCP segments when new outgoing data arrives from the user if any previously transmitted data on the connection remains unacknowledged. This inhibition is to be unconditional; no timers, tests for size of data received, or other conditions are required. Implementation typically requires one or two lines inside a TCP program.
|
From rfc813, describing delayed ack:

The following scheme seems a good compromise. The receiver of data will refrain from sending an acknowledgement under certain circumstances, in which case it must set a timer which will cause the acknowledgement to be sent later. However, the receiver should do this only where it is a reasonable guess that some other event will intervene and prevent the necessity of the timer interrupt. The most obvious event on which to depend is the arrival of another segment. So, if a segment arrives, postpone sending an acknowledgement if both of the following conditions hold. First, the push bit is not set in the segment, since it is a reasonable assumption that there is more data coming in a subsequent segment. Second, there is no revised window information to be sent back.
This algorithm will insure that the timer, although set, is seldom used. The interval of the timer is related to the expected inter-segment delay, which is in turn a function of the particular network through which the data is flowing. For the Arpanet, a reasonable interval seems to be 200 to 300 milliseconds. Appendix A describes an adaptive algorithm for measuring this delay.
|
From the above, this is delayed ack, not Nagle. So that's why setting TcpNoDelay in Windows has no effect on the above syndrome. There's no data waiting to be transmitted (else it would have been part of the acknowledge - but that had a window size of 0, so no data was sent). However, tcpAckFrequency does govern this delayed ack; setting it to 1 (as described in previous posts) should cause the delayed acknowledgement to be not delayed. (Important note: Microsoft's knowledge base indicates that a hotfix is required before setting tcpAckFrequency to 1 works. I suspect that's why some people don't see it working...)
Although the push bit is set on the incoming segment (the 'P'), the acknowledgement is delayed, anyway. I guess whomever implemented the TCP stack in Windows didn't bother following the reference in rfc1122 to rfc813. Grrr. (As a side-note, I'm going to assume that Macs don't suffer from this problem, since OS X was derived from a BSD variant, and I don't believe any of their TCP stacks are that broken.)
So far, so good. So why does a local proxy work in solving the problem as well? (either a linux gateway, or a vmware partition, or whatever). Well, this is what the packet trace looks like, on the proxy:
11:28:00.686393 IP wow.server.3724 > proxy.32800: P 14746:14824(78) ack 162 win 14600
11:28:00.686394 IP proxy.32800 > wow.server.3724: . ack 14824 win 17520
11.28.00.686396 IP proxy.3724 > wow.client.1230: P 14746:14824(78) ack 162 win 14600
11:28:00.857213 IP wow.client.1230 > proxy.3724: . ack 14824 win 17520
So, the proxy immediately acknowledges the packet from the server; the client still delays its ack. But the important thing is that the ack destined for the server is immediate.
Finally, lowerping. People have reported that lowerping gives lower in game latency than setting tcpAckFrequency, and that tcpAckFrequency gives improvements as well. Why is this? I'm going to ask a related question: why does removing the delayed ack locally (either by proxy or tcpAckFrequency) work to reduce in-game latency anyway?
See,
11:28:00.857213 IP wow.client.1230 > wow.server.3724: . ack 14824 win 17520
is just an acknowledgement. There's no information from the client in that packet, except something that says 'I got your packet'. Specifically, there's nothing related to user data - and so there's nothing there that directly speeds up the 'something happens on the server' to the 'user's response to that something reaches the server' time (which you could consider the important thing that in-game latency measures). So, why does removing the delay work? The answer would seem to be at the server (and this is worth putting in a paragraph all of its own):
Blizzard have Nagle's algorithm enabled on their servers.
If this is the case, then new data from the server is not transmitted until either a full packet (i.e. 1.5k of data) is available, or the previous packet is acknowledged (as per the description in rfc896, quoted above). But, with delayed ack at the client, this previous packet acknowledgement is sent ~170ms later than it could be, so new data from the server is delayed for another ~170ms i.e. it is delayed by up to 'network latency from server to client and back' + 170ms. For oceanic people that's ~250ms + 170ms i.e. ~ 420ms. Rather than just 250ms. So 'client action' to 'response from server arrives at client' is between 250ms and 670ms.
And that's where the latency comes from. And that's why lowerping works better. Their network latency from server to client and back is much lower (because, so far as the server is concerned, the client is in the US), and so data is queued up at the server for much less time.
So this:
|
* Reduced network latency by disabling the Nagle algorithm.
|
from the 2.3.2 patch notes appears to have been done on the client. It appears not to have been done on the server. (If it had been done on the server, removing delayed acks would not make *any* change to observed in-game latency. They still do. QED.)
The obvious solution is then that Blizzard should disable Nagle on their servers. However, I should point out that this could have serious negative impacts.
(a) The amount of bandwidth Blizzard uses would increase. That's because they'd be sending more packets to us. While the amount of server -> client data would remain the same, the overhead of packet headers would increase. And with many small packets, the packet header overhead predominates bandwidth consumption.
(b) The amount of bandwidth your client requires would also increase. For the same reason, more packets being sent to your client implies more bandwidth you need. And this would imply the 'Razorgore client disconnect syndrome' (in which the server sends more data to your client than your connection can handle, leading the server to believe that your client has timed out, and so disconnect you) would become more likely to happen. Especially if you're still on dialup.
So, I don't know.
PS. This is a reasonably well-known problem with delayed ack + Nagle. It's mentioned (in not quite so many words) in the wikipedia article for Nagle's algorithm, for instance...
PPS. I'll probably be posting this on the official forums once I find somewhere that it won't immediately disappear into the morass. Or unless someone here finds where I've been incredibly stupid, and I'm entirely wrong about what's going on. (It's happened before.)