Acoustical knee health assessment has long promised an alternative to clinically available medical imaging tools, but this modality has yet to be adopted in medical practice. The field is currently led by machine learning models processing acoustical features, which have presented promising diagnostic performances. However, these methods overlook the intricate multi-source nature of audio signals and the underlying mechanisms at play. By addressing this critical gap, the present paper introduces a novel causal framework for validating knee acoustical features. We argue that current machine learning methodologies for acoustical knee diagnosis lack the required assurances and thus cannot be used to classify acoustic features as biomarkers. Our framework establishes a set of essential theoretical guarantees necessary to validate this claim. We apply our methodology to three real-world experiments investigating the effect of researchers' expectations, the experimental protocol and the wearable employed sensor. This investigation reveals latent issues such as underlying shortcut learning and performance inflation. This study is the first independent result reproduction study in the field of acoustical knee health evaluation. We conclude with actionable insights from our findings, offering valuable guidance to navigate these crucial limitations in future research.
翻译:长期以来,声学膝关节健康评估一直被视为临床可用医学成像工具的替代方案,但该模式尚未在医疗实践中得到采用。目前该领域主要由处理声学特征的机器学习模型引领,这些模型已展现出有前景的诊断性能。然而,这些方法忽视了音频信号复杂的多源性本质及其背后的作用机制。通过填补这一关键空白,本文提出了一种用于验证膝关节声学特征的新型因果框架。我们认为,当前用于声学膝关节诊断的机器学习方法缺乏必要的保证,因此不能将声学特征归类为生物标志物。我们的框架建立了一组验证此主张所必需的基本理论保证。我们将该方法应用于三项真实世界实验,分别研究研究者预期、实验方案及所采用的可穿戴传感器的影响。此项研究揭示了潜在问题,如潜在的捷径学习与性能虚高。本研究是声学膝关节健康评估领域的首个独立结果复现研究。最后,我们基于研究结果提出了可操作的见解,为未来研究中应对这些关键局限性提供了宝贵指导。