eca58c34bfc722e5.tex
1: \begin{abstract}
2: Multiplicative stochasticity such as Dropout improves the robustness and generalizability of deep neural networks.
3: Here, we further demonstrate that always-on multiplicative stochasticity combined with simple threshold neurons are sufficient operations for deep neural networks.
4: We call such models Neural Sampling Machines (NSM). 
5: We find that the probability of activation of the NSM exhibits a self-normalizing property that mirrors Weight Normalization, a previously studied mechanism that fulfills many of the features of Batch Normalization in an online fashion.
6: The normalization of activities during training speeds up convergence by preventing internal covariate shift caused by changes in the input distribution.
7: The always-on stochasticity of the NSM confers the following advantages: the network is identical in the inference and learning phases, making the NSM suitable for {online learning}, it can exploit stochasticity inherent to a physical substrate such as analog non-volatile memories for in-memory computing, and it is suitable for Monte Carlo sampling, while requiring almost exclusively addition and comparison operations.
8: We demonstrate NSMs on standard classification benchmarks (MNIST and CIFAR) and event-based classification benchmarks (N-MNIST and DVS Gestures). Our results show that NSMs perform comparably or better than conventional artificial neural networks with the same architecture.
9: \end{abstract}
10: