abstract:1c5f1c2305be13e4.tex

1: \begin{abstract}

2: We study the problem of distribution to real regression, where one aims to regress a mapping $f$ that takes in a distribution input covariate $P\in \calI$ (for a non-parametric family of distributions $\calI$) and outputs a real-valued response $Y=f(P) + \epsilon$.

3: This setting was recently studied in \cite{poczos2013distribution}, where the ``Kernel-Kernel'' estimator was introduced and shown to have a polynomial rate of convergence.

4: However, evaluating a new prediction with the Kernel-Kernel estimator scales as $\Omega(N)$. This causes the difficult situation where a large amount of data may be necessary for a low estimation risk, but the computation cost of estimation becomes infeasible when the data-set is too large. To this end, we propose the Double-Basis estimator, which looks to alleviate this big data problem in two ways: first, the Double-Basis estimator is shown to have a computation complexity that is independent of the number of of instances $N$ when evaluating new predictions after training; secondly, the Double-Basis estimator is shown to have a fast rate of convergence for a general class of mappings $f\in\calF$.

5: \end{abstract}

6: