In molecular dynamics (MD) simulations, rare events, such as protein folding, are typically studied by means of enhanced sampling techniques, most of which rely on the definition of a collective variable (CV) along which the acceleration occurs. Obtaining an expressive CV is crucial, but often hindered by the lack of information about the particular event, e.g., the transition from unfolded to folded conformation. We propose a simulation-free data augmentation strategy using physics-inspired metrics to generate geodesic interpolations resembling protein folding transitions, thereby improving sampling efficiency without true transition state samples. Leveraging interpolation progress parameters, we introduce a regression-based learning scheme for CV models, which outperforms classifier-based methods when transition state data is limited and noisy
翻译:在分子动力学(MD)模拟中,稀有事件(如蛋白质折叠)通常通过增强采样技术进行研究,其中大多数技术依赖于定义一条沿其发生加速的集体变量(CV)。获取表达性强的CV至关重要,但常因缺乏特定事件(例如从非折叠构象到折叠构象的转变)的信息而受阻。我们提出一种无模拟的数据增强策略,利用物理启发的度量生成类似蛋白质折叠转变的测地线插值,从而在没有真实转变态样本的情况下提高采样效率。借助插值进度参数,我们引入一种基于回归的CV模型学习方案,当转变态数据有限且存在噪声时,该方案优于基于分类器的方法。