abstract:06810924bd62c1ac.tex

1: \begin{abstract}

2:     We present a new concentration of measure inequality for sums of independent bounded random variables, which we name a split-$\kl$ inequality. The inequality combines the combinatorial power of the $\kl$ inequality with ability to exploit low variance. While for Bernoulli random variables the $\kl$ inequality is tighter than the Empirical Bernstein, for random variables taking values inside a bounded interval and having low variance the Empirical Bernstein inequality is tighter than the $\kl$. The proposed split-$\kl$ inequality yields the best of both worlds. We discuss an application of the split-$\kl$ inequality to bounding excess losses. We also derive a PAC-Bayes-split-$\kl$ inequality and use a synthetic example and several UCI datasets to compare it with the PAC-Bayes-$\kl$, PAC-Bayes Empirical Bernstein, PAC-Bayes Unexpected Bernstein, and PAC-Bayes Empirical Bennett inequalities.

3:

4:     %The bound is based on one of the tightest first-order concentration inequality: the $\kl$ inequality, but is amenable to exploiting the second order information, and can be tighter than the Bennett's inequality. We extend it to the PAC-Bayes analysis and develop the PAC-Bayes split $\kl$ inequality, which improves on the PAC-Bayes un-expected Bernstein inequality by~\citet{MGG20} and the PAC-Bayes-Bennett inequality by~\citet{WMLIS21}.

5:     %provides a a fast convergence rate of order $1/n$ to the Gibbs regression rule when the variance o the empirical loss is small.

6:     %We use it to bound the excess risk and obtain state-of-the-art generalization bound for the Gibbs regression rule.

7:     %The theoretial analysis is confirmed on synthetic example and several UCI datasets. The split $\kl$ inequality can also be of interest for the study of concentration of measure in other domains.

8: \end{abstract}

9: