definition:dd6a79db10a7fe7f.tex

1: \begin{definition}[]

2: Given the expression for the bias in Theorem \ref{bias_theorem}, the ensemble estimation technique proposed in \cite{Kevin16} can be applied to improve the convergence rate of the MI estimator \eqref{est_def}. Assume that the densities in \textbf{A3} have continuous bounded derivatives up to the order $q$, where $q\geq d$.

3: Let $\mathcal{T}:=\{t_1,...,t_T\}$ be a set of index values with $t_i<c$, where $c>0$ is a constant. Let $\epsilon(t):=tN^{-1/2d}$. For a given set of weights $w(t)$ the weighted ensemble estimator is then defined as

4: \begin{align}\label{EDGE_def}

5: \widehat{I}_w:=\sum_{t\in \mathcal{T}}w(t)\widehat{I}_{\epsilon(t)},

6: \end{align}

7: where $\widehat{I}_{\epsilon(t)}$ is the mutual information estimator with the parameter $\epsilon(t)$. Using \eqref{bias_terms}, for $q>0$ the bias of the weighted ensemble estimator \eqref{EDGE_def} takes the form

8: \begin{equation}

9: \mathbb{B}(\hat{I}_w) = \sum_{i=1}^q Ci N^{-\frac{i}{2d}} \sum_{t\in \mathcal{T}} w(t) t^{i} +O\of{{\frac{t^d}{N^{1/2}}}}+O\of{\frac{1}{N\epsilon^d}}

10: \label{Ensemble_bias}

11: \end{equation}

12:

13: Given the form (\ref{Ensemble_bias}), as long as $T\geq q$, we can select the  weights $w(t)$ to force to zero the slowly decaying terms in (\ref{Ensemble_bias}), i.e.  $\sum_{t\in \tau} w(t)t^{i/d}=0$ subject to the constraint that$\sum_{t\in \tau} w(t)=1$. However, $T$ should be strictly greater than $q$ in order to control the variance, which is upper bounded by the euclidean norm squared of the weights  $\omega$. In particular we have the following theorem (the proof is given in Appendix C):

14:

15: \begin{theorem} \label{ensemble_theorem}

16: For $T>d$ let  $w_0$ be the solution to:

17: \begin{align}

18: \min_w &\qquad \|w\|_2 \nonumber\\

19: \textit{subject to} &\qquad \sum_{t\in \mathcal{T}}w(t)=1, \nonumber\\

20: &\qquad \sum_{t\in \mathcal{T}}w(t)t^{i}=0, i\in \mathbb{N}, i\leq d.

21: \end{align}

22: Then the MSE rate of the ensemble estimator $\widehat{I}_{w_0}$ is $O(1/N)$.

23: \end{theorem}

24:

25: \end{definition}

26: