abstract:179c65c77a42b6e8.tex

1: \begin{abstract}

2: Multilayer Perceptrons struggle to learn certain simple arithmetic tasks.

3: Specialist neural modules for arithmetic can outperform classical architectures with gains in extrapolation, interpretability and convergence speeds, but are highly sensitive to the training range.

4: In this paper, we show that Neural Multiplication Units (NMUs) are unable to reliably learn tasks as simple as multiplying two inputs when given different training ranges.

5: Causes of failure are linked to inductive and input biases which encourage convergence to solutions in undesirable optima.

6: A solution, the stochastic NMU (sNMU), is proposed to apply reversible stochasticity, encouraging avoidance of such optima whilst converging to the true solution.

7: Empirically, we show that stochasticity provides improved robustness with the potential to improve learned representations of upstream networks for numerical and image tasks.

8: \end{abstract}

9: