abstract:92c47029d7e03dcd.tex

1: \begin{abstract}%   <- trailing '%' for backward compatibility of .sty file

2: In this paper, we consider Discretized Neural Networks (DNNs) consisting of low-precision weights and activations, which suffer from either infinite or zero gradients caused by the non-differentiable discrete function in the training process.

3: In this case, most training-based DNNs use the standard Straight-Through Estimator (STE) to approximate the gradient w.r.t. discrete value.

4: However, the standard STE will cause the gradient mismatch problem, i.e., the approximated gradient direction may deviate from the steepest descent direction.

5: In other words, the gradient mismatch implies the approximated gradient with perturbations.

6: To address this problem, we introduce the duality theory to regard the perturbation of the approximated gradient as the perturbation of the metric in Linearly Nearly Euclidean (LNE) manifolds.

7: Simultaneously, under the Ricci-DeTurck flow, we prove the dynamical stability and convergence of the LNE metric with the $L^2$-norm perturbation, which can provide a theoretical solution for the gradient mismatch problem.

8: In practice, we also present the steepest descent gradient flow for DNNs on LNE manifolds from the viewpoints of the information geometry and mirror descent.

9: The experimental results on various datasets demonstrate that our method achieves better and more stable performance for DNNs than other representative training-based methods.

10: \end{abstract}

11: