High performance AI applications require very high bandwidth network communication along with predictable tail latency. Any congestion on the network directly affects the tail latencies and thereby affects the performance of the end applications. A principle of an example solution is for the receiver timer to be a function of the rate at which the transmitter sends data packets and adjust based on the number of packets received since the last acknowledgement was transmitted. With this, the receiver timer can shorten and send frequent ACKs if very few packets are received to urge the transmitter to keep its tracker window moving and keep a steady supply of data packets. In case of a consistent supply of data packets, the receiver timer can be larger to be able to batch many more packets and send a batch acknowledgement making better use of the bandwidth.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.