959e3f5647f58e50.tex
1: \begin{abstract}
2:  A common strategy for sparse linear regression is to introduce regularization,
3:  which eliminates irrelevant features by letting the
4:   corresponding weights be zeros.
5:   However, regularization often shrinks the estimator for
6:   relevant features, which leads to incorrect feature selection.
7:   
8:   Motivated by the above-mentioned issue, we propose Bayesian masking (BM), a sparse estimation method which
9:   imposes no regularization on the weights. 
10:   The key concept of BM is to introduce binary latent variables that randomly mask features.
11:   Estimating the masking rates determines the relevance of the features automatically.
12:   We derive a variational Bayesian inference algorithm that maximizes the lower bound of
13:   the factorized information criterion (FIC), which is a recently developed asymptotic criterion for evaluating the marginal log-likelihood.
14:   In addition, we propose reparametrization to accelerate the convergence of the derived algorithm.
15:   Finally, we show that BM outperforms Lasso and automatic relevance determination (ARD) in terms of the sparsity-shrinkage trade-off. 
16: \end{abstract}
17: