Physical Knot Classification Beyond Accuracy: A Benchmark and Diagnostic Study

Physical knot classification is a challenging fine-grained recognition task in which the intended discriminative cue is rope crossing structure; however, high closed-set accuracy may still arise from low-level appearance shortcuts rather than genuine topological understanding. In this work, we introduce dataset (1,440 images, 10 classes), which trains models on loosely tied knots and evaluates them on tightly dressed configurations to probe whether structure-guided training yields topology-specific gains. We demonstrate that topological distance successfully predicts residual inter-class confusion across multiple backbone architectures, validating the utility of our topology-aware evaluation framework. Furthermore, we propose topology-aware centroid alignment (TACA) and an auxiliary crossing-number prediction objective as two complementary forms of structural supervision. Notably, Swin-T with TACA achieves a consistent positive specificity gain (Delta_spec = +1.18 pp) across all random seeds under the canonical protocol, and auxiliary crossing-number prediction exhibits robust performance across data regimes without the real-versus-random reversal observed for centroid alignment. Causal probes reveal that background changes alone flip 17-32% of predictions and phone-photo accuracy drops by 58-69 percentage points, underscoring that appearance bias remains the principal obstacle to deployment. These results collectively demonstrate that our diagnostic workflow provides a principled and practical tool for evaluating whether a hand-crafted structural prior delivers genuine task-relevant benefit beyond generic regularization.

翻译：物理绳结分类是一项挑战性的细粒度识别任务，其预期判别线索为绳索交叉结构；然而，高闭集准确率可能源于低层次外观捷径而非真正的拓扑理解。本研究引入一个包含1440张图像、10个类别的数据集，该数据集训练模型于松散打结的绳结，并评估其在紧密整理形态下的表现，以探究结构引导训练能否带来拓扑特异性增益。我们证实拓扑距离能成功预测多种骨干网络架构中残留的类间混淆，验证了拓扑感知评估框架的有效性。此外，我们提出拓扑感知质心对齐（TACA）与辅助交叉数预测目标两种互补形式的结构监督。值得注意的是，在标准协议下，采用TACA的Swin-T在所有随机种子中均实现一致的正特异性增益（Δ_spec = +1.18 pp），而辅助交叉数预测在不同数据规模下展现稳健性能，且未出现质心对齐方法中真实与随机样本的逆转现象。因果探针揭示：仅背景变化即可翻转17-32%的预测结果，手机照片准确率下降58-69个百分点，突显外观偏差仍是部署的主要障碍。这些结果共同表明，我们的诊断工作流为评估手工设计结构先验能否在通用正则化之外提供真正任务相关收益，提供了一种原理性且实用的工具。