Multi-person pose estimation (MPPE), which aims to locate the key points for all persons in the frames, is an active research branch of computer vision. Variable human poses and complex scenes make MPPE dependent on local details and global structures; their absence may cause key point feature misalignment. In this case, high-order spatial interactions that can effectively link the local and global information of features are particularly important. However, most methods do not include spatial interactions. A few methods have low-order spatial interactions, but achieving a good balance between accuracy and complexity is challenging. To address the above problems, a dual-residual spatial interaction network (DRSI-Net) for MPPE with high accuracy and low complexity is proposed herein. Compared to other methods, DRSI-Net recursively performs residual spatial information interactions on the neighbouring features so that more useful spatial information can be retained and more similarities can be obtained between shallow and deep extracted features. The channel and spatial dual attention mechanism introduced in the multi-scale feature fusion also helps the network to adaptively focus on features relevant to the target key points and further refine the generated poses. Simultaneously, by optimising the interactive channel dimensions and dividing the gradient flow, the spatial interaction module is designed to be lightweight, thus reducing the complexity of the network. According to the experimental results on the COCO dataset, the proposed DRSI-Net outperforms other state-of-the-art methods in accuracy and complexity.
翻译:多人姿态估计(MPPE)旨在定位图像帧中所有人的关键点,是计算机视觉领域一个活跃的研究分支。多变的姿态与复杂场景使得MPPE依赖局部细节与全局结构;这两者的缺失可能导致关键点特征失准。在此情况下,能有效联结特征局部与全局信息的高阶空间交互尤为重要。然而,多数方法未引入空间交互,少数采用低阶空间交互的方法则在精度与复杂度间难以取得良好平衡。针对上述问题,本文提出一种高精度、低复杂度的双残差空间交互网络(DRSI-Net)用于MPPE。与其他方法相比,DRSI-Net对邻近特征递归执行残差空间信息交互,从而保留更多有效空间信息,并增强浅层与深层提取特征间的相似性。在多尺度特征融合中引入的通道与空间双注意力机制,可帮助网络自适应聚焦于目标关键点相关特征,进一步细化生成姿态。同时,通过优化交互通道维度与梯度流分割,空间交互模块被设计为轻量级,从而降低网络复杂度。在COCO数据集上的实验结果表明,所提出的DRSI-Net在精度与复杂度上均优于其他现有最优方法。