1b808adc81c9107c.tex
1: \begin{abstract}
2: Sparse coding is a core building block in many data analysis and machine learning pipelines.
3: Typically it is solved by relying on generic optimization techniques, such as the Iterative Soft Thresholding Algorithm and its accelerated version (ISTA, FISTA).
4: These methods are optimal in the class of first-order methods for non-smooth, convex functions.
5: However, they do not exploit the particular structure of the problem at hand nor the input data distribution.
6: An acceleration using neural networks, coined LISTA, was proposed in \cite{Gregor10}, which showed empirically that one could achieve high quality estimates with few iterations by modifying the parameters of the proximal splitting appropriately.
7: 
8: In this paper we study the reasons for such acceleration.
9: Our mathematical analysis reveals that it is related to a specific matrix factorization of the Gram kernel of the dictionary, which attempts to nearly diagonalise the kernel with a basis that produces a small perturbation of the $\ell_1$ ball.
10: When this factorization succeeds, we prove that the resulting splitting algorithm enjoys an improved convergence bound with respect to the non-adaptive version.
11: Moreover, our analysis also shows that conditions for acceleration occur mostly at the beginning of the iterative process, consistent with numerical experiments.
12: We further validate our analysis by showing that on dictionaries where this factorization does not exist, adaptive acceleration fails.
13: 
14: 
15: \end{abstract}
16: