1094367f154dccec.tex
1: \begin{abstract}
2: 
3: State-space models (SSMs) have recently emerged as a framework for learning long-range sequence tasks. 
4: An example is the structured state-space sequence (S4) layer, which uses the diagonal-plus-low-rank structure of the HiPPO initialization framework. 
5: However, the complicated structure of the S4 layer poses challenges; and, in an effort to address these challenges, models such as S4D and S5 have considered a purely diagonal structure. 
6: This choice simplifies the implementation, improves computational efficiency, and allows channel communication.
7: However, diagonalizing the HiPPO framework is itself an ill-posed problem.
8: %
9: In this paper, we propose a general solution for this and related ill-posed diagonalization problems in machine learning.
10: We introduce a generic, backward-stable ``perturb-then-diagonalize'' (PTD) methodology, which is based on the pseudospectral theory of non-normal operators, and which may be interpreted as the approximate diagonalization of the non-normal matrices defining SSMs.
11: Based on this, we introduce the S4-PTD and S5-PTD models.
12: %
13: Through theoretical analysis of the transfer functions of different initialization schemes, we demonstrate that the S4-PTD/S5-PTD initialization strongly converges to the HiPPO framework, while the S4D/S5 initialization only achieves weak convergences. 
14: As a result, our new models show resilience to Fourier-mode noise-perturbed inputs, a crucial property not achieved by the S4D/S5 models. 
15: %
16: In addition to improved robustness, our S5-PTD model averages 87.6\% accuracy on the Long-Range Arena benchmark, demonstrating that the PTD methodology helps to improve the accuracy of deep learning models.
17: \end{abstract}
18: