1: \begin{abstract}
2: We consider the projected gradient algorithm for the nonconvex best subset selection
3: problem that minimizes a given empirical loss function under an $\ell_0$-norm
4: constraint.
5: Through decomposing the feasible set of the given sparsity constraint
6: as a finite union of linear subspaces, we present two acceleration
7: schemes with global convergence guarantees, one by same-space
8: extrapolation and the other by subspace identification.
9: The former fully utilizes the problem structure to greatly accelerate
10: the optimization speed with only negligible additional cost.
11: The latter leads to a two-stage meta-algorithm that first uses
12: classical projected gradient iterations to identify the correct
13: subspace containing an optimal solution, and then switches to
14: a highly-efficient smooth optimization method in the identified
15: subspace to attain superlinear convergence.
16: Experiments demonstrate that the proposed accelerated algorithms are
17: magnitudes faster than their non-accelerated counterparts as well as
18: the state of the art.
19: \end{abstract}