1: \begin{abstract}
2: This paper focusses on the optimal implementation of a Mean Variance Estimation network (MVE network) \citep{nix1994estimating}. This type of network is often used as a building block for uncertainty estimation methods in a regression setting, for instance Concrete dropout \citep{gal2017concrete} and Deep Ensembles \citep{lakshminarayanan2017simple}. Specifically, an MVE network assumes that the data is produced from a normal distribution with a mean function and variance function. The MVE network outputs a mean and variance estimate and optimizes the network parameters by minimizing the negative loglikelihood.
3:
4: In this paper, we discuss two points: firstly, the convergence difficulties reported in recent work can be relatively easily prevented by following the recommendation from the original authors that a warm-up period should be used. During this period, only the mean is optimized assuming a fixed variance. This recommendation is often not used in practice. We experimentally demonstrate how essential this step is. We also examine if keeping the mean estimate fixed after the warm-up leads to different results than estimating both the mean and the variance simultaneously after the warm-up. We do not observe a substantial difference. Secondly, we propose a novel improvement of the MVE network: separate regularization of the mean and the variance estimate. We demonstrate, both on toy examples and on a number of benchmark UCI regression data sets, that following the original recommendations and the novel separate regularization can lead to significant improvements.
5: \end{abstract}
6: