abstract:15db05d6c6469541.tex

1: \begin{abstract}

2: Limiting failures of machine learning systems is vital for safety-critical applications.

3: %

4: In order to improve the robustness of machine learning systems,

5: Distributionally Robust Optimization (DRO) has been proposed as a generalization of Empirical Risk Minimization (ERM)

6: % aiming at addressing this need.

7: %

8: However, its use in deep learning has been severely restricted due to the relative inefficiency of the optimizers available for DRO in comparison to the wide-spread variants of Stochastic Gradient Descent (SGD) optimizers for ERM.

9: %

10: We propose SGD with hardness weighted sampling, a principled and efficient optimization method for DRO in machine learning that is particularly suited in the context of deep learning.

11: %

12: Similar to a hard example mining strategy

13: % in essence and

14: in practice, the proposed algorithm is straightforward to implement and computationally as efficient as SGD-based optimizers used for deep learning, requiring minimal overhead computation.

15: % .

16: % %

17: % It only requires to compute an adaptive sampling probabilities at each iteration using a softmax layer and a vector of loss values estimates for the training examples.

18: %

19: In contrast to typical ad hoc hard mining approaches,

20: % and exploiting recent theoretical results in deep learning optimization,

21: we prove the convergence of our DRO algorithm for over-parameterized deep learning networks with $\relu$ activation and finite number of layers and parameters.

22: %

23: Our experiments on brain tumor segmentation in MRI demonstrate the feasibility and the usefulness of our approach. Using our hardness weighted sampling leads to a decrease of $2\%$ of the interquartile range of the Dice scores for the enhanced tumor and the tumor core regions.

24: %

25: The code for the proposed hard weighted sampler will be made publicly available.

26: \end{abstract}

27: