1: \begin{abstract}
2: Limiting failures of machine learning systems is vital for safety-critical applications.
3: %
4: In order to improve the robustness of machine learning systems,
5: Distributionally Robust Optimization (DRO) has been proposed as a generalization of Empirical Risk Minimization (ERM)
6: % aiming at addressing this need.
7: %
8: However, its use in deep learning has been severely restricted due to the relative inefficiency of the optimizers available for DRO in comparison to the wide-spread variants of Stochastic Gradient Descent (SGD) optimizers for ERM.
9: %
10: We propose SGD with hardness weighted sampling, a principled and efficient optimization method for DRO in machine learning that is particularly suited in the context of deep learning.
11: %
12: Similar to a hard example mining strategy
13: % in essence and
14: in practice, the proposed algorithm is straightforward to implement and computationally as efficient as SGD-based optimizers used for deep learning, requiring minimal overhead computation.
15: % .
16: % %
17: % It only requires to compute an adaptive sampling probabilities at each iteration using a softmax layer and a vector of loss values estimates for the training examples.
18: %
19: In contrast to typical ad hoc hard mining approaches,
20: % and exploiting recent theoretical results in deep learning optimization,
21: we prove the convergence of our DRO algorithm for over-parameterized deep learning networks with $\relu$ activation and finite number of layers and parameters.
22: %
23: Our experiments on brain tumor segmentation in MRI demonstrate the feasibility and the usefulness of our approach. Using our hardness weighted sampling leads to a decrease of $2\%$ of the interquartile range of the Dice scores for the enhanced tumor and the tumor core regions.
24: %
25: The code for the proposed hard weighted sampler will be made publicly available.
26: \end{abstract}
27: