abstract:ff590fb09b624179.tex

1: \begin{abstract}

2: %

3:

4: Like all sub-fields of machine learning Bayesian Deep Learning is driven by empirical validation of its theoretical proposals.

5: %

6: Given the many aspects of an experiment it is always possible that minor or even major experimental flaws can slip by both authors and reviewers.

7: %

8: One of the most popular experiments used to evaluate approximate inference techniques is the regression experiment on UCI datasets.

9: %

10: However, in this experiment, models which have been trained to convergence have often been compared with baselines trained only for a fixed number of iterations.

11: %

12: We find that a well-established baseline, Monte Carlo dropout, when evaluated under the same experimental settings shows significant improvements.

13: %

14: In fact, the baseline outperforms or performs competitively with methods that claimed to be superior to the very same baseline method when they were introduced.

15: %

16: Hence, by exposing this flaw in experimental procedure, we highlight the importance of using identical experimental setups to evaluate, compare, and benchmark methods in Bayesian Deep Learning.

17: %

18: \end{abstract}

19: