The geometry of an object plays a vital role in modulating its interactions with the physical world. It nevertheless remains difficult to describe geometric information numerically for the purposes of statistical inference or classification tasks. Here, we introduce a new topological transform which leverages directional piecewise-linear Morse theory to quantify the geometry of an embedded object by cataloguing critical points across multiple height-functions. The output of this Morse transform records both the heights and the local topological type (peak, trough or saddle) of the critical points that characterise the underlying shape, retaining finer information than the Euler characteristic transform whilst naturally prioritising a shape's outermost regions. Crucially, this output can be further compressed into a rich but compact feature vector. We benchmark the Morse feature vector as a descriptor for ligand-based virtual screening (LBVS), which intrinsically depends on the shape of molecules. Under a common gradient-boosted tree classification pipeline, Morse descriptors achieve the highest mean AUROC when compared to other topological transform descriptors and to standard shape-based LBVS descriptors.
翻译:物体的几何结构在调控其与物理世界的交互中起着至关重要的作用。然而,为统计推断或分类任务而数值化描述几何信息仍然存在困难。本文提出一种新的拓扑变换,利用方向性分段线性莫尔斯理论,通过分类多个高度函数上的临界点来量化嵌入物体的几何特性。该莫尔斯变换的输出同时记录了刻画底层形状的临界点的高度与局部拓扑类型(峰值、谷值或鞍点),在自然优先考虑形状最外层区域的同时,保留了比欧拉特征变换更精细的信息。关键在于,该输出可进一步压缩为丰富而紧凑的特征向量。我们将莫尔斯特征向量作为配体基虚拟筛选(一种本质上依赖分子形状的方法)的描述符进行基准测试。在通用梯度提升树分类流程中,与其他拓扑变换描述符及标准基于形状的配体基虚拟筛选描述符相比,莫尔斯描述符实现了最高的平均AUROC值。