abstract:2984cb4b6fa6ccad.tex

1: \begin{abstract}

2: The performance of human pose estimation depends on the spatial accuracy of keypoint localization.

3: Most existing methods pursue the spatial accuracy through learning the high-resolution (HR) representation from input images. By the experimental analysis, we find that the HR representation leads to a sharp increase of computational cost, while the accuracy improvement remains marginal compared with the low-resolution (LR) representation.

4: In this paper, we propose a design paradigm for cost-effective network with LR representation for efficient pose estimation, named FasterPose.

5: Whereas the LR design largely shrinks the model complexity, yet how to effectively train the network with respect to the spatial accuracy is a concomitant challenge.

6: We study the training behavior of FasterPose, and formulate a novel regressive cross-entropy (RCE) loss function for accelerating the convergence and promoting the accuracy.

7: The RCE loss generalizes the ordinary cross-entropy loss from the binary supervision to a continuous range, thus the training of pose estimation network is able to benefit from the sigmoid function.

8: By doing so, the output heatmap can be inferred from the LR features without loss of spatial accuracy, while the computational cost and model size has been significantly reduced.

9: Compared with the previously dominant network of pose estimation, our method reduces 58\% of the FLOPs and simultaneously gains 1.3\% improvement of accuracy.

10: Extensive experiments show that FasterPose yields promising results on the common benchmarks, \textit{i.e.},~ COCO and MPII, consistently validating the effectiveness and efficiency for practical utilization, especially the low-latency and low-energy-budget applications in the non-GPU scenarios.

11: \end{abstract}

12: