15db05d6c6469541.tex
1: \begin{abstract}
2: Limiting failures of machine learning systems is vital for safety-critical applications.
3: %
4: In order to improve the robustness of machine learning systems,
5: Distributionally Robust Optimization (DRO) has been proposed as a generalization of Empirical Risk Minimization (ERM)
6: % aiming at addressing this need.
7: %
8: However, its use in deep learning has been severely restricted due to the relative inefficiency of the optimizers available for DRO in comparison to the wide-spread variants of Stochastic Gradient Descent (SGD) optimizers for ERM.
9: %
10: We propose SGD with hardness weighted sampling, a principled and efficient optimization method for DRO in machine learning that is particularly suited in the context of deep learning.
11: % 
12: Similar to a hard example mining strategy 
13: % in essence and 
14: in practice, the proposed algorithm is straightforward to implement and computationally as efficient as SGD-based optimizers used for deep learning, requiring minimal overhead computation.
15: % .
16: % % 
17: % It only requires to compute an adaptive sampling probabilities at each iteration using a softmax layer and a vector of loss values estimates for the training examples.
18: % 
19: In contrast to typical ad hoc hard mining approaches, 
20: % and exploiting recent theoretical results in deep learning optimization, 
21: we prove the convergence of our DRO algorithm for over-parameterized deep learning networks with $\relu$ activation and finite number of layers and parameters.
22: %
23: Our experiments on brain tumor segmentation in MRI demonstrate the feasibility and the usefulness of our approach. Using our hardness weighted sampling leads to a decrease of $2\%$ of the interquartile range of the Dice scores for the enhanced tumor and the tumor core regions.
24: % 
25: The code for the proposed hard weighted sampler will be made publicly available.
26: \end{abstract}
27: