1: \begin{abstract}
2: A common strategy for sparse linear regression is to introduce regularization,
3: which eliminates irrelevant features by letting the
4: corresponding weights be zeros.
5: However, regularization often shrinks the estimator for
6: relevant features, which leads to incorrect feature selection.
7:
8: Motivated by the above-mentioned issue, we propose Bayesian masking (BM), a sparse estimation method which
9: imposes no regularization on the weights.
10: The key concept of BM is to introduce binary latent variables that randomly mask features.
11: Estimating the masking rates determines the relevance of the features automatically.
12: We derive a variational Bayesian inference algorithm that maximizes the lower bound of
13: the factorized information criterion (FIC), which is a recently developed asymptotic criterion for evaluating the marginal log-likelihood.
14: In addition, we propose reparametrization to accelerate the convergence of the derived algorithm.
15: Finally, we show that BM outperforms Lasso and automatic relevance determination (ARD) in terms of the sparsity-shrinkage trade-off.
16: \end{abstract}
17: