This paper addresses a critical flaw in MediaPipe Holistic's hand Region of Interest (ROI) prediction, which struggles with non-ideal hand orientations, affecting sign language recognition accuracy. We propose a data-driven approach to enhance ROI estimation, leveraging an enriched feature set including additional hand keypoints and the z-dimension. Our results demonstrate better estimates, with higher Intersection-over-Union compared to the current method. Our code and optimizations are available at https://github.com/sign-language-processing/mediapipe-hand-crop-fix.
翻译:本文针对MediaPipe Holistic在手部感兴趣区域(ROI)预测中的关键缺陷,该缺陷在处理非理想手部方向时效果不佳,影响了手语识别精度。我们提出了一种数据驱动的方法来增强ROI估计,利用包含额外手部关键点和z维度在内的丰富特征集。实验结果表明,与现有方法相比,我们的方法能获得更优的估计结果,具有更高的交并比。相关代码及优化方案已开源至https://github.com/sign-language-processing/mediapipe-hand-crop-fix。