fcc521b02c43ac75.tex
1: \begin{abstract}
2:   A deep equilibrium model (DEQ) is  implicitly defined through an equilibrium point of an infinite-depth  weight-tied model with an input-injection. Instead of  infinite computations, it solves an equilibrium point directly with root-finding and computes gradients with implicit differentiation. In this paper, the training dynamics of over-parameterized DEQs are investigated, and we propose a novel probabilistic framework to overcome the challenge arising from the weight-sharing and the infinite depth.  By supposing a condition on the initial equilibrium point, we prove that the gradient descent  converges to a globally optimal solution at a linear convergence rate for the quadratic loss function.  We further perform a fine-grained non-asymptotic analysis about random DEQs and the corresponding weight-untied
3: models, and show that  the required initial condition is satisfied via mild over-parameterization. Moreover, we  show that the unique equilibrium point always exists during the training.
4: \end{abstract}
5: