1: \begin{abstract}
2: Though Remote Direct Memory Access (RDMA) promises to reduce datacenter network
3: latencies significantly compared to TCP (e.g., 10x), end-to-end congestion control in the presence of incasts is a challenge. Targeting the full generality of the congestion problem, previous schemes rely on slow, iterative convergence to the appropriate sending rates (e.g., TIMELY takes 50 RTTs).
4: Several papers have shown that even in oversubscribed datacenter networks most congestion occurs at the receiver. Accordingly, we propose a divide-and-specialize approach, called {\em \name}, which isolates the common case of receiver congestion and further subdivides the remaining in-network congestion into the simpler spatially-localized and the harder spatially-dispersed cases. For receiver congestion, we propose {\em direct apportioning of sending rates (DASR)} in which a receiver for $n$ senders directs each sender to cut its rate by a factor of $n$, converging in only one RTT. For the spatially-localized case, \name provides fast (under one RTT) response by adding novel switch hardware for {\em in-order flow deflection (IOFD)} because RDMA disallows packet reordering on which previous load balancing schemes rely. For the uncommon spatially-dispersed case, \name falls back to DCQCN.
5: Small-scale testbed measurements and at-scale simulations, respectively, show that
6: \name achieves $60\%$ (2.5x) and $79\%$ (4.8x) lower
7: $99^{th}$-percentile latency, and similar and
8: $58\%$ higher throughput than InfiniBand, and TIMELY and DCQCN.
9: \end{abstract}
10: