As critical visual details become obscured, the low visibility and high ISO noise in extremely low-light images pose a significant challenge to human pose estimation. Current methods fail to provide high-quality representations due to reliance on pixel-level enhancements that compromise semantics and the inability to effectively handle extreme low-light conditions for robust feature learning. In this work, we propose a frequency-based framework for low-light human pose estimation, rooted in the "divide-and-conquer" principle. Instead of uniformly enhancing the entire image, our method focuses on task-relevant information. By applying dynamic illumination correction to the low-frequency components and low-rank denoising to the high-frequency components, we effectively enhance both the semantic and texture information essential for accurate pose estimation. As a result, this targeted enhancement method results in robust, high-quality representations, significantly improving pose estimation performance. Extensive experiments demonstrating its superiority over state-of-the-art methods in various challenging low-light scenarios.
翻译:在极低光图像中,由于关键视觉细节被遮蔽、能见度低且高ISO噪声严重,人体姿态估计面临重大挑战。现有方法因依赖像素级增强而损害语义信息,且无法有效处理极端低光条件以实现鲁棒特征学习,导致无法提供高质量表征。本研究提出一种基于频率域的低光人体姿态估计框架,其核心遵循"分而治之"原则。与均匀增强整幅图像不同,我们的方法聚焦于任务相关信息:通过对低频分量实施动态光照校正,对高频分量进行低秩去噪,有效增强了姿态估计所必需的语义信息与纹理信息。这种针对性增强方法最终生成鲁棒的高质量表征,显著提升了姿态估计性能。大量实验证明,该方法在多种挑战性低光场景中均优于当前最先进方法。