8185f0cf989ec7b6.tex
1: \begin{abstract}
2: For various optimization methods, gradient descent-based algorithms can achieve outstanding performance and have been widely used in various tasks. Among those commonly used algorithms, {\adam} owns many advantages such as fast convergence with both the momentum term and the adaptive learning rate. However, since the loss functions of most deep neural networks are non-convex, {\adam} also shares the drawback of getting stuck in local optima easily. To resolve such a problem, the idea of combining {\ga} with base learners is introduced to rediscover the best solutions. Nonetheless, from our analysis, the idea of combining {\ga} with a batch of base learners still has its shortcomings. The effectiveness of {\ga} can hardly be guaranteed if the unit models converge to close or the same solutions. To resolve this problem and further maximize the advantages of {\ga} with base learners, we propose to implement the {\bos} strategy for input model training, which can subsequently improve the effectiveness of {\ga}. In this paper, we introduce a novel optimization algorithm, namely \textbf{B}oosting based \textbf{G}enetic \textbf{{\adam}} ({\yb}). With both theoretic analysis and empirical experiments, we will show that adding the {\bos} strategy into the {\yb} model can help models jump out the local optima and converge to better solutions.
3: \end{abstract}
4: