Many real objects are modeled as discrete sets of points, such as corners or other salient features. For our main applications in chemistry, points represent atomic centers in a molecule or a solid material. We study the problem of classifying discrete (finite and periodic) sets of unordered points under isometry, which is any transformation preserving distances in a metric space. Experimental noise motivates the new practical requirement to make such invariants Lipschitz continuous so that perturbing every point in its epsilon-neighborhood changes the invariant up to a constant multiple of epsilon in a suitable distance satisfying all metric axioms. Since the given points are unordered, the key challenge is to compute all invariants and metrics in a near-linear time of the input size. We define the Pointwise Distance Distribution (PDD) for any discrete set and prove, in addition to the properties above, the completeness of PDD for all periodic sets in general position. The PDD can compare nearly 2 million crystals from the world's five largest databases within 2 hours on a modest desktop computer. The impact is upholding data integrity in crystallography because the PDD will not allow anyone to claim a `new' material as a noisy disguise of a known crystal.
翻译:许多实际对象被建模为离散点集,例如角点或其他显著特征。针对我们在化学领域的主要应用,点代表分子或固体材料中的原子中心。本文研究了在等距变换下对无序离散点集(有限集与周期集)进行分类的问题,等距变换指在度量空间中保持距离不变的任何变换。实验噪声引出了新的实际需求:需要使此类不变量满足利普希茨连续性,从而当每个点在其ε邻域内扰动时,不变量仅发生与ε成常数倍的变化,且该变化需在满足所有度量公理的距离度量下成立。由于给定点是无序的,核心挑战在于以接近输入规模线性时间的复杂度计算所有不变量与度量。我们为任意离散集定义了"点间距离分布"(PDD),并证明除上述性质外,PDD对所有一般位置周期集具有完备性。在普通台式计算机上,PDD可在2小时内完成对全球五大晶体数据库中近200万种晶体的比对。其重要意义在于维护晶体学数据完整性——PDD可有效防止任何人将已知晶体经过噪声伪装后声称发现"新"材料。