In recent years, human pose estimation has made significant progress through the implementation of deep learning techniques. However, these techniques still face limitations when confronted with challenging scenarios, including occlusion, diverse appearances, variations in illumination, and overlap. To cope with such drawbacks, we present the Spatial Attention-based Distribution Integration Network (SADI-NET) to improve the accuracy of localization in such situations. Our network consists of three efficient models: the receptive fortified module (RFM), spatial fusion module (SFM), and distribution learning module (DLM). Building upon the classic HourglassNet architecture, we replace the basic block with our proposed RFM. The RFM incorporates a dilated residual block and attention mechanism to expand receptive fields while enhancing sensitivity to spatial information. In addition, the SFM incorporates multi-scale characteristics by employing both global and local attention mechanisms. Furthermore, the DLM, inspired by residual log-likelihood estimation (RLE), reconfigures a predicted heatmap using a trainable distribution weight. For the purpose of determining the efficacy of our model, we conducted extensive experiments on the MPII and LSP benchmarks. Particularly, our model obtained a remarkable $92.10\%$ percent accuracy on the MPII test dataset, demonstrating significant improvements over existing models and establishing state-of-the-art performance.
翻译:近年来,通过深度学习技术的实施,人体姿态估计取得了显著进展。然而,在面对遮挡、外观多样性、光照变化和重叠等挑战性场景时,这些技术仍存在局限性。为解决此类缺陷,我们提出基于空间注意力的分布集成网络(SADI-NET),以提高此类情境中的定位准确性。该网络由三个高效模块组成:感受增强模块(RFM)、空间融合模块(SFM)和分布学习模块(DLM)。在经典HourglassNet架构基础上,我们将基础模块替换为所提出的RFM。RFM融合了空洞残差模块和注意力机制,在扩展感受野的同时增强对空间信息的敏感性。此外,SFM通过采用全局和局部注意力机制整合多尺度特征。而受残差对数似然估计(RLE)启发的DLM,则利用可训练分布权重重构预测热力图。为验证模型有效性,我们在MPII和LSP基准数据集上进行了大量实验。特别地,我们的模型在MPII测试数据集上达到了92.10%的卓越准确率,显著超越现有模型并确立了最先进的性能。