abstract:999b2994cb083ebf.tex

1: \begin{abstract}%   <- trailing '%' for backward compatibility of .sty file

2: This paper describes a family of probabilistic architectures designed for online learning under the logarithmic loss.

3: Rather than relying on non-linear transfer functions, our method gains representational power by the use of data conditioning.

4: %; more precisely, one can interpret our model as a sequence of data dependent linear networks that employ a certain type of weight sharing across time.

5: We state under general conditions a learnable capacity theorem that shows this approach can in principle learn any bounded Borel-measurable function on a compact subset of euclidean space; the result is stronger

6: than many universality results for connectionist architectures because we provide both the model and the learning procedure for which convergence is guaranteed.

7: %, as it implies that if a solution exists in the model class then our choice of weight updating scheme will eventually find it.

8: %Initial empirical results suggest this method warrants further investigation.

9: %To illustrate the power of this approach, we obtain state of the art density modeling performance on the \mnist\ dataset, \emph{in a single online pass through the data.}

10: \end{abstract}

11: