Machine learning (ML) and deep learning models are extensively used for parameter optimization and regression problems. However, not all inverse problems in ML are ``identifiable,'' indicating that model parameters may not be uniquely determined from the available data and the data model's input-output relationship. In this study, we investigate the notion of model parameter identifiability through a case study focused on parameter estimation from motion sensor data. Utilizing a bipedal-spring mass human walk dynamics model, we generate synthetic data representing diverse gait patterns and conditions. Employing a deep neural network, we attempt to estimate subject-wise parameters, including mass, stiffness, and equilibrium leg length. The results show that while certain parameters can be identified from the observation data, others remain unidentifiable, highlighting that unidentifiability is an intrinsic limitation of the experimental setup, necessitating a change in data collection and experimental scenarios. Beyond this specific case study, the concept of identifiability has broader implications in ML and deep learning. Addressing unidentifiability requires proven identifiable models (with theoretical support), multimodal data fusion techniques, and advancements in model-based machine learning. Understanding and resolving unidentifiability challenges will lead to more reliable and accurate applications across diverse domains, transcending mere model convergence and enhancing the reliability of machine learning models.
翻译:机器学习(ML)与深度学习模型广泛应用于参数优化与回归问题。然而,并非所有ML中的逆问题都具有"可辨识性",即模型参数可能无法从现有数据及数据模型的输入输出关系中唯一确定。本研究通过以运动传感器数据参数估计为案例,探讨了模型参数可辨识性的概念。基于双足弹簧质量人体步行动力学模型,我们生成了代表不同步态模式与条件的合成数据。采用深度神经网络,我们尝试估计受试者层面参数(包括质量、刚度及平衡腿长度)。结果表明:虽然部分参数可从观测数据中辨识,但其他参数仍不可辨识,这凸显了不可辨识性是实验设置的内在局限,需要改变数据采集与实验场景。超越此特定案例,可辨识性概念在ML与深度学习领域具有更广泛的意义。解决不可辨识性问题需要具备理论支撑的可证明可辨识模型、多模态数据融合技术,以及基于模型的机器学习进展。理解并解决不可辨识性挑战将提升跨领域应用的可靠性与准确性,超越单纯的模型收敛,增强机器学习模型的可靠性。