1: \begin{abstract}
2: Multilayer Perceptrons struggle to learn certain simple arithmetic tasks.
3: Specialist neural modules for arithmetic can outperform classical architectures with gains in extrapolation, interpretability and convergence speeds, but are highly sensitive to the training range.
4: In this paper, we show that Neural Multiplication Units (NMUs) are unable to reliably learn tasks as simple as multiplying two inputs when given different training ranges.
5: Causes of failure are linked to inductive and input biases which encourage convergence to solutions in undesirable optima.
6: A solution, the stochastic NMU (sNMU), is proposed to apply reversible stochasticity, encouraging avoidance of such optima whilst converging to the true solution.
7: Empirically, we show that stochasticity provides improved robustness with the potential to improve learned representations of upstream networks for numerical and image tasks.
8: \end{abstract}
9: