For safety and robustness of AI systems, we introduce topological parallax as a theoretical and computational tool that compares a trained model to a reference dataset to determine whether they have similar multiscale geometric structure. Our proofs and examples show that this geometric similarity between dataset and model is essential to trustworthy interpolation and perturbation, and we conjecture that this new concept will add value to the current debate regarding the unclear relationship between overfitting and generalization in applications of deep-learning. In typical DNN applications, an explicit geometric description of the model is impossible, but parallax can estimate topological features (components, cycles, voids, etc.) in the model by examining the effect on the Rips complex of geodesic distortions using the reference dataset. Thus, parallax indicates whether the model shares similar multiscale geometric features with the dataset. Parallax presents theoretically via topological data analysis [TDA] as a bi-filtered persistence module, and the key properties of this module are stable under perturbation of the reference dataset.
翻译:为确保AI系统的安全性和鲁棒性,我们引入拓扑视差作为理论计算工具,通过比较训练模型与参考数据集来判断它们是否具有相似的多尺度几何结构。我们的证明与实例表明,数据集与模型之间的几何相似性对于可信赖的插值和扰动至关重要,并推测这一新概念将为当前关于深度学习应用中过拟合与泛化之间不明确关系的讨论增添价值。在典型的深度神经网络应用中,无法对模型进行显式的几何描述,但视差可通过分析参考数据集对测地畸变引起的Rips复形的影响,来估计模型中的拓扑特征(如分量、环、空洞等)。因此,视差可指示模型是否与数据集共享相似的多尺度几何特征。该视差在理论上通过拓扑数据分析(TDA)表现为双滤波持续模,且该模的关键性质在参考数据集的扰动下保持稳定。