1: \begin{abstract}%
2: The conditional mean embedding (CME) encodes conditional probability distributions within the reproducing kernel Hilbert spaces (RKHS). The CME plays a key role in several well-known machine learning tasks such as reinforcement learning, analysis of dynamical systems, etc. We present an algorithm to learn the CME \emph{incrementally} from data via an {operator-valued} stochastic gradient descent. As is well-known, function learning in RKHS suffers from scalability challenges from large data. We utilize a \emph{compression} mechanism to counter the scalability challenge. The core contribution of this paper is a finite-sample performance guarantee on the \emph{last iterate} of the online compressed operator learning algorithm with fast-mixing Markovian samples, when the target CME may not be contained in the hypothesis space. We illustrate the efficacy of our algorithm by applying it to the analysis of an example dynamical system.
3: % In particular, we establish both asymptotically almost surely convergence and mean-square last iterate convergence. Finally, we demonstrate the performance of the proposed CME methods to analyze unknown nonlinear dynamical systems, and demonstrate its merits relative to some benchmarks.
4: \end{abstract}
5: