abstract:0176d3a865ac8139.tex

1: \begin{abstract}

2: %What’s the domain?

3: The Neural Arithmetic Logic Unit (NALU) is a neural network layer that can learn exact arithmetic operations between the elements of a hidden state.

4: The goal of NALU is to learn perfect extrapolation, which requires learning the exact underlying logic of an unknown arithmetic problem.

5: %What’s the issue?

6: Evaluating the performance of the NALU is non-trivial as one arithmetic problem might have many solutions.

7: As a consequence, single-instance MSE has been used to evaluate and compare performance between models.

8: However, it can be hard to interpret what magnitude of MSE represents a correct solution and models sensitivity to initialization.

9: %What’s your contribution?

10: We propose using a success-criterion to measure if and when a model converges.

11: Using a success-criterion we can summarize success-rate over many initialization seeds and calculate confidence intervals.

12: We contribute a generalized version of the previous arithmetic benchmark to measure models sensitivity under different conditions.

13: %Why is it novel?

14: This is, to our knowledge, the first extensive evaluation with respect to convergence of the NALU and its sub-units.

15: %What’s interesting about it?

16: %An interesting finding is the high variability in convergence when modifying dataset parameters.

17: %How does it perform?

18: Using a success-criterion to summarize 4800 experiments we find that consistently learning arithmetic extrapolation is challenging, in particular for multiplication.

19: \ifdefined\nonanonymous\footnote{code for experiments is publicly available at: \url{https://github.com/AndreasMadsen/stable-nalu}.}\fi

20: \end{abstract}

21: