04f2b530bc063207.tex
1: \begin{abstract}
2: The conditional gradient method (CGM) has been widely used for fast sparse approximation, having a low per iteration computational cost for structured sparse regularizers. We explore the sparsity acquiring properties of a generalized CGM (gCGM), where the constraint is replaced by a penalty function based on a gauge penalty; this can be done without significantly increasing the per-iteration computation, and applies to general notions of sparsity. 
3: Without assuming bounded iterates, we show $O(1/t)$ convergence of the function values and gap of gCGM. We couple this with a safe screening rule, and show that at a rate $O(1/(t\delta^2))$, the screened support matches the support at the solution, where $\delta \geq 0$ measures how close the problem is to being degenerate.
4: In our experiments, we show that the gCGM for these modified penalties have similar feature selection properties as common penalties, but with potentially more stability over the choice of hyperparameter. 
5: %changed: "LASSO" to "common penalties"
6: %\note{FB: why LASSO, and not regular penalties? Be careful, we are only allowed an edit distance of 20.}
7: \end{abstract}
8: