abstract:25058cb40af4136a.tex

1: \begin{abstract}

2: Gradient descent and its variants are de facto standard algorithms for training machine learning models.

3: As gradient descent is sensitive to its hyperparameters, we need to tune the hyperparameters carefully using a grid search,

4: but it is time-consuming, especially when multiple hyperparameters exist.

5: Recently, parameter-free methods that adjust the hyperparameters on the fly have been studied.

6: However, the existing work only studied parameter-free methods for the stepsize, and parameter-free methods for other hyperparameters have not been explored.

7: For instance, the gradient clipping threshold is also a crucial hyperparameter in addition to the stepsize to prevent gradient explosion issues,

8: but none of the existing studies investigated the parameter-free methods for clipped gradient descent.

9: In this work, we study the parameter-free methods for clipped gradient descent.

10: Specifically, we propose Inexact Polyak Stepsize, which converges to the optimal solution without any hyperparameters tuning, and its convergence rate is asymptotically independent of $L$ under $L$-smooth and $(L_0, L_1)$-smooth assumptions of the loss function as that of clipped gradient descent with well-tuned hyperparameters.

11: We numerically validated our convergence results using a synthetic function and demonstrated the effectiveness of our proposed methods using LSTM, Nano-GPT, and T5.

12: \end{abstract}

13: