5d8dddcd7bb19a68.tex
1: \begin{abstract}
2: Though Remote Direct Memory Access (RDMA) promises to reduce datacenter network 
3: latencies significantly compared to TCP (e.g., 10x),   end-to-end congestion control in the presence of  incasts is a challenge. Targeting the full generality of the congestion problem,  previous schemes rely on slow, iterative convergence to the appropriate sending rates (e.g., TIMELY takes 50 RTTs). 
4: Several papers have shown that  even in oversubscribed datacenter networks most congestion occurs at the receiver. Accordingly, we propose a divide-and-specialize approach, called {\em \name},  which isolates the common case of receiver congestion and further subdivides the remaining in-network congestion into the simpler spatially-localized  and the harder spatially-dispersed cases. For receiver congestion, we propose {\em direct apportioning of sending rates (DASR)} in which  a receiver for $n$ senders directs each sender to cut its rate by a factor of $n$, converging  in only one RTT. For the spatially-localized case, \name provides fast (under one RTT) response by adding novel switch hardware  for {\em in-order flow deflection (IOFD)} because RDMA disallows packet reordering on which previous load balancing schemes rely.  For the uncommon spatially-dispersed case, \name falls back to DCQCN. 
5: Small-scale testbed measurements and at-scale simulations, respectively, show that 
6: \name achieves $60\%$ (2.5x) and  $79\%$ (4.8x) lower 
7: $99^{th}$-percentile latency, and similar and 
8: $58\%$ higher throughput than InfiniBand, and TIMELY and DCQCN.
9: \end{abstract}
10: