Symmetric Divergence and Normalized Similarity: A Unified Topological Framework for Representation Analysis

Topological Data Analysis (TDA) offers a principled, intrinsic lens for comparing neural representations. However, existing paired topological divergences (e.g., RTD) are limited by heuristic asymmetry and, more critically, unbounded scores that depend on sample size, hindering reliable cross-scenario benchmarking. To address these challenges, we develop a unified topological toolkit serving two complementary needs: fine-grained structural diagnosis and robust, standardized evaluation. First, we complete the RTD framework by introducing Symmetric Representation Topology Divergence (SRTD) and its efficient variant SRTD-lite. Beyond resolving the theoretical asymmetry of prior variants, SRTD consolidates diagnostic information into a single, comprehensive cross-barcode signature. This allows for precise localization of structural discrepancies and serves as an effective optimization objective without the overhead of dual directional computations. Second, to enable reliable benchmarking across heterogeneous settings, we propose Normalized Topological Similarity (NTS). By measuring the rank correlation of hierarchical merge orders, NTS yields a scale-invariant metric bounded between -1 and 1, effectively overcoming the scale and sample-dependence of unnormalized divergences. Experiments across synthetic and real-world deep learning settings demonstrate that our toolkit captures functional shifts in CNNs missed by geometric measures and robustly maps LLM genealogy even under distance saturation, offering a rigorous, topology-aware perspective that complements measures like CKA.

翻译：拓扑数据分析（TDA）为比较神经表征提供了一种基于原理的内在视角。然而，现有成对拓扑散度（例如RTD）受限于启发式的不对称性，且更关键的是其无界分数依赖于样本量，阻碍了可靠的跨场景基准测评。为解决这些挑战，我们开发了一个服务于结构精细诊断与鲁棒标准化评估这两类互补需求的统一拓扑工具包。首先，我们通过引入对称表征拓扑散度（SRTD）及其高效变体SRTD-lite，完善了RTD框架。除了解决先前变体的理论不对称性外，SRTD还将诊断信息整合为单一的、全面的交叉条形码特征。这使得能够精确定位结构差异，并在无需双向计算的额外开销下，作为有效的优化目标。其次，为了实现异构设置下的可靠基准测评，我们提出归一化拓扑相似性（NTS）。通过测量层次合并顺序的秩相关性，NTS产生一个介于-1和1之间的尺度不变度量，有效克服了未归一化散度的尺度依赖和样本依赖问题。在合成与真实深度学习场景下的实验表明，我们的工具包能够捕捉几何度量无法揭示的CNN功能转变，并能在距离饱和情况下鲁棒地映射LLM谱系，从而提供一种与CKA等方法互补的、基于拓扑理论的严谨视角。