1a156e79848f4abe.tex
1: \begin{abstract}
2: Deep unfolding is a promising deep-learning technique in which 
3: an iterative algorithm is unrolled to a deep network architecture with trainable parameters.
4: In the case of gradient descent algorithms,  
5: as a result of the training process, 
6: one often observes the acceleration of the convergence speed with 
7: learned non-constant step size parameters whose behavior 
8: is not intuitive nor interpretable from conventional theory.
9: In this paper, we provide a theoretical interpretation of the learned step size of 
10: deep-unfolded gradient descent (DUGD). 
11: We first prove that the training process of DUGD reduces not only the mean squared error loss 
12: but also the spectral radius related to the convergence rate.
13: Next, we show that minimizing the upper bound of the spectral radius naturally leads to 
14: the Chebyshev step which is a sequence of the step size based on Chebyshev polynomials.
15: The numerical experiments confirm that the Chebyshev steps qualitatively reproduce the learned step size parameters in DUGD, which  provides a plausible interpretation of the learned parameters.
16: Additionally, we show that the Chebyshev steps achieve the lower bound of the convergence rate for the first-order method in a specific limit without learning parameters {or} momentum terms.
17: \end{abstract}
18: