Modeling dependencies between random variables independently from their marginals is fundamental in applications ranging from finance to (structural) biology. In this work, we undertake this problem using circula to model data living on the $d$-dimensional flat torus $\mathbb{T}^d$, making two contributions. First, using a low rank covariance structure to define circulae based on a latent variable model, we design the first closed-form normalized distribution on the flat torus $\mathbb{T}^d$--with covariance structure. Second, building on this framework, we propose the first models for joint distributions of torsion angles (backbone and side-chains) for neighboring amino-acids in proteins. In practice, we fit mixtures on flat torii from $\mathbb{T}^{2}$ to $\mathbb{T}^{14}$, and show they are SOTA in terms of likelihood and sparsity. We anticipate that these models will prove fundamental to move from discrete structural studies like in AlphaFold2, to thermodynamics and kinetics, which are the ultimate goals in theoretical biophysics.
翻译:从金融到(结构)生物学等应用领域中,独立于边际分布建模随机变量间的依赖关系具有基础性意义。本研究利用圆分布处理$d$维平坦环面$\mathbb{T}^d$上的数据建模问题,做出两项贡献:第一,通过低秩协方差结构定义基于潜变量模型的圆分布,首次构建了具有协方差结构的平坦环面$\mathbb{T}^d$上闭合形式归一化分布;第二,基于该框架提出蛋白质相邻氨基酸(主链和侧链)扭角联合分布的首批模型。实际应用中,我们在$\mathbb{T}^{2}$至$\mathbb{T}^{14}$的平坦环面上拟合混合模型,证明其在似然度和稀疏性方面达到最优水平。预期这些模型将成为推动结构生物学从AlphaFold2等离散结构研究转向热力学与动力学分析的关键工具,而后者正是理论生物物理学的终极目标。