1: \begin{abstract}
2: The performance of human pose estimation depends on the spatial accuracy of keypoint localization.
3: Most existing methods pursue the spatial accuracy through learning the high-resolution (HR) representation from input images. By the experimental analysis, we find that the HR representation leads to a sharp increase of computational cost, while the accuracy improvement remains marginal compared with the low-resolution (LR) representation.
4: In this paper, we propose a design paradigm for cost-effective network with LR representation for efficient pose estimation, named FasterPose.
5: Whereas the LR design largely shrinks the model complexity, yet how to effectively train the network with respect to the spatial accuracy is a concomitant challenge.
6: We study the training behavior of FasterPose, and formulate a novel regressive cross-entropy (RCE) loss function for accelerating the convergence and promoting the accuracy.
7: The RCE loss generalizes the ordinary cross-entropy loss from the binary supervision to a continuous range, thus the training of pose estimation network is able to benefit from the sigmoid function.
8: By doing so, the output heatmap can be inferred from the LR features without loss of spatial accuracy, while the computational cost and model size has been significantly reduced.
9: Compared with the previously dominant network of pose estimation, our method reduces 58\% of the FLOPs and simultaneously gains 1.3\% improvement of accuracy.
10: Extensive experiments show that FasterPose yields promising results on the common benchmarks, \textit{i.e.},~ COCO and MPII, consistently validating the effectiveness and efficiency for practical utilization, especially the low-latency and low-energy-budget applications in the non-GPU scenarios.
11: \end{abstract}
12: