Reconstructing the physical complexity of many-body dynamical systems can be challenging. Starting from the trajectories of their constitutive units (raw data), typical approaches require selecting appropriate descriptors to convert them into time-series, which are then analyzed to extract interpretable information. However, identifying the most effective descriptor is often non-trivial. Here, we report a data-driven approach to compare the efficiency of various descriptors in extracting information from noisy trajectories and translating it into physically relevant insights. As a prototypical system with non-trivial internal complexity, we analyze molecular dynamics trajectories of an atomistic system where ice and water coexist in equilibrium near the solid/liquid transition temperature. We compare general and specific descriptors often used in aqueous systems: number of neighbors, molecular velocities, Smooth Overlap of Atomic Positions (SOAP), Local Environments and Neighbors Shuffling (LENS), Orientational Tetrahedral Order, and distance from the fifth neighbor ($d_5$). Using Onion Clustering -- an efficient unsupervised method for single-point time-series analysis -- we assess the maximum extractable information for each descriptor and rank them via a high-dimensional metric. Our results show that advanced descriptors like SOAP and LENS outperform classical ones due to higher signal-to-noise ratios. Nonetheless, even simple descriptors can rival or exceed advanced ones after local signal denoising. For example, $d_5$, initially among the weakest, becomes the most effective at resolving the system's non-local dynamical complexity after denoising. This work highlights the critical role of noise in information extraction from molecular trajectories and offers a data-driven approach to identify optimal descriptors for systems with characteristic internal complexity.
翻译:重建多体动力学系统的物理复杂性可能具有挑战性。从系统组成单元的运动轨迹(原始数据)出发,典型方法需要选择合适的描述符将其转换为时间序列,随后通过分析这些序列提取可解释信息。然而,确定最有效的描述符往往并非易事。本文提出一种数据驱动方法,用于比较不同描述符从噪声轨迹中提取信息并将其转化为物理相关洞察的效率。我们以具有非平凡内部复杂性的原型系统为例,分析了在固/液相变温度附近处于平衡态的水冰共存原子体系的分子动力学轨迹。我们比较了水体系研究中常用的通用与专用描述符:邻居数量、分子速度、原子位置平滑重叠(SOAP)、局部环境与邻居重排(LENS)、四面体取向序参量,以及第五近邻距离($d_5$)。通过采用适用于单点时间序列分析的高效无监督方法——洋葱聚类(Onion Clustering),我们评估了各描述符的最大可提取信息量,并利用高维度量指标对其进行排序。研究结果表明,SOAP和LENS等先进描述符凭借更高的信噪比优于经典描述符。然而,经过局部信号去噪处理后,即使简单描述符也能与先进描述符相媲美甚至更优。例如,原本效果较弱的$d_5$在去噪后成为解析系统非局部动力学复杂性最有效的描述符。这项工作揭示了噪声在分子轨迹信息提取中的关键作用,并为具有特征内部复杂性的系统提供了一种识别最优描述符的数据驱动方法。