1: \begin{abstract}
2: We derive the mean squared error convergence rates of kernel density-based
3: plug-in estimators of mutual information measures between two multidimensional
4: random variables $\mathbf{X}$ and $\mathbf{Y}$ for two cases: 1)
5: $\X$ and $\Y$ are both continuous; 2) $\X$ is continuous and $\Y$
6: is discrete. Using the derived rates, we propose an ensemble estimator
7: of these information measures for the second case by taking a weighted
8: sum of the plug-in estimators with varied bandwidths. The resulting
9: ensemble estimator achieves the $1/N$ parametric convergence rate
10: when the conditional densities of the continuous variables are sufficiently
11: smooth. To the best of our knowledge, this is the first nonparametric
12: mutual information estimator known to achieve the parametric convergence
13: rate for this case, which frequently arises in applications (e.g.
14: variable selection in classification). The estimator is simple to
15: implement as it uses the solution to an offline convex optimization
16: problem and simple plug-in estimators. A central limit theorem is
17: also derived for the ensemble estimator. Ensemble estimators that
18: achieve the parametric rate are also derived for the first case ($\X$
19: and $\Y$ are both continuous) and another case 3) $\X$ and $\Y$
20: may have any mixture of discrete and continuous components.
21: \end{abstract}