1: \begin{abstract}
2: Most machine learning algorithms, such as classification or regression, treat the individual data point as the object of interest.
3: Here we consider extending machine learning algorithms to operate on \emph{groups} of data points.
4: We suggest treating a group of data points as an \iid sample set from an underlying feature distribution for that group.
5: Our approach employs kernel machines with a kernel on \iid sample sets of vectors.
6: We define certain kernel functions on pairs of distributions,
7: and then use a nonparametric estimator to consistently estimate those functions based on sample sets.
8: The projection of the estimated Gram matrix to the cone of symmetric positive semi-definite matrices enables us to use kernel machines for classification, regression, anomaly detection, and low-dimensional embedding in the space of distributions.
9: We present several numerical experiments both on real and simulated datasets to demonstrate the advantages of our new approach.
10: \end{abstract}
11: