Identifiability in representation learning is commonly evaluated using standard metrics (e.g., MCC, DCI, R^2) on synthetic benchmarks with known ground-truth factors. These metrics are assumed to reflect recovery up to the equivalence class guaranteed by identifiability theory. We show that this assumption holds only under specific structural conditions: each metric implicitly encodes assumptions about both the data-generating process (DGP) and the encoder. When these assumptions are violated, metrics become misspecified and can produce systematic false positives and false negatives. Such failures occur both within classical identifiability regimes and in post-hoc settings where identifiability is most needed. We introduce a taxonomy separating DGP assumptions from encoder geometry, use it to characterise the validity domains of existing metrics, and release an evaluation suite for reproducible stress testing and comparison.
翻译:表示学习中的可识别性通常通过标准指标(如MCC、DCI、R²)在具有已知真实因子的合成基准上进行评估。这些指标被认为能反映在可识别性理论所保证的等价类范围内的恢复程度。我们证明该假设仅在特定结构条件下成立:每个指标都隐式编码了关于数据生成过程和编码器的双重假设。当这些假设被违反时,指标会产生误设,可能导致系统性假阳性和假阴性结果。此类失效既发生在经典可识别性体系内,也出现在最需要可识别性的后验场景中。我们提出了一个分离数据生成过程假设与编码器几何特性的分类体系,用以刻画现有指标的有效域,并发布了一个用于可复现压力测试与比较的评估套件。