abstract:eca58c34bfc722e5.tex

1: \begin{abstract}

2: Multiplicative stochasticity such as Dropout improves the robustness and generalizability of deep neural networks.

3: Here, we further demonstrate that always-on multiplicative stochasticity combined with simple threshold neurons are sufficient operations for deep neural networks.

4: We call such models Neural Sampling Machines (NSM).

5: We find that the probability of activation of the NSM exhibits a self-normalizing property that mirrors Weight Normalization, a previously studied mechanism that fulfills many of the features of Batch Normalization in an online fashion.

6: The normalization of activities during training speeds up convergence by preventing internal covariate shift caused by changes in the input distribution.

7: The always-on stochasticity of the NSM confers the following advantages: the network is identical in the inference and learning phases, making the NSM suitable for {online learning}, it can exploit stochasticity inherent to a physical substrate such as analog non-volatile memories for in-memory computing, and it is suitable for Monte Carlo sampling, while requiring almost exclusively addition and comparison operations.

8: We demonstrate NSMs on standard classification benchmarks (MNIST and CIFAR) and event-based classification benchmarks (N-MNIST and DVS Gestures). Our results show that NSMs perform comparably or better than conventional artificial neural networks with the same architecture.

9: \end{abstract}

10: