1: \begin{abstract}% <- trailing '%' for backward compatibility of .sty file
2: This paper describes a family of probabilistic architectures designed for online learning under the logarithmic loss.
3: Rather than relying on non-linear transfer functions, our method gains representational power by the use of data conditioning.
4: %; more precisely, one can interpret our model as a sequence of data dependent linear networks that employ a certain type of weight sharing across time.
5: We state under general conditions a learnable capacity theorem that shows this approach can in principle learn any bounded Borel-measurable function on a compact subset of euclidean space; the result is stronger
6: than many universality results for connectionist architectures because we provide both the model and the learning procedure for which convergence is guaranteed.
7: %, as it implies that if a solution exists in the model class then our choice of weight updating scheme will eventually find it.
8: %Initial empirical results suggest this method warrants further investigation.
9: %To illustrate the power of this approach, we obtain state of the art density modeling performance on the \mnist\ dataset, \emph{in a single online pass through the data.}
10: \end{abstract}
11: