abstract:1a156e79848f4abe.tex

1: \begin{abstract}

2: Deep unfolding is a promising deep-learning technique in which

3: an iterative algorithm is unrolled to a deep network architecture with trainable parameters.

4: In the case of gradient descent algorithms,

5: as a result of the training process,

6: one often observes the acceleration of the convergence speed with

7: learned non-constant step size parameters whose behavior

8: is not intuitive nor interpretable from conventional theory.

9: In this paper, we provide a theoretical interpretation of the learned step size of

10: deep-unfolded gradient descent (DUGD).

11: We first prove that the training process of DUGD reduces not only the mean squared error loss

12: but also the spectral radius related to the convergence rate.

13: Next, we show that minimizing the upper bound of the spectral radius naturally leads to

14: the Chebyshev step which is a sequence of the step size based on Chebyshev polynomials.

15: The numerical experiments confirm that the Chebyshev steps qualitatively reproduce the learned step size parameters in DUGD, which  provides a plausible interpretation of the learned parameters.

16: Additionally, we show that the Chebyshev steps achieve the lower bound of the convergence rate for the first-order method in a specific limit without learning parameters {or} momentum terms.

17: \end{abstract}

18: