1: \begin{abstract}
2: %
3:
4: Like all sub-fields of machine learning Bayesian Deep Learning is driven by empirical validation of its theoretical proposals.
5: %
6: Given the many aspects of an experiment it is always possible that minor or even major experimental flaws can slip by both authors and reviewers.
7: %
8: One of the most popular experiments used to evaluate approximate inference techniques is the regression experiment on UCI datasets.
9: %
10: However, in this experiment, models which have been trained to convergence have often been compared with baselines trained only for a fixed number of iterations.
11: %
12: We find that a well-established baseline, Monte Carlo dropout, when evaluated under the same experimental settings shows significant improvements.
13: %
14: In fact, the baseline outperforms or performs competitively with methods that claimed to be superior to the very same baseline method when they were introduced.
15: %
16: Hence, by exposing this flaw in experimental procedure, we highlight the importance of using identical experimental setups to evaluate, compare, and benchmark methods in Bayesian Deep Learning.
17: %
18: \end{abstract}
19: