Physical knot classification is a fine-grained visual classification (FGVC) scenario in which appearance cues are deliberately suppressed: different classes share the same rope material, color, and background, and class identity resides primarily in crossing structure. We introduce the Knots-10 benchmark, comprising 1,440 images with a deployment-oriented split that trains on loosely tied knots and tests on tightly dressed ones. Swin-T and TransFG both average 97.2% accuracy; PMG scores 94.5%, consistent with the hypothesis that jigsaw shuffling disrupts crossing continuity. McNemar tests cannot separate four of the five general-purpose backbones, so small ranking margins should be interpreted with caution. A Mantel permutation test shows that topological distance significantly correlates with confusion patterns in three of the five models (p < 0.01). We propose TACA regularization, which improves embedding-topology alignment from rho=0.46 to rho=0.65 without improving classification accuracy; a random-distance ablation yields comparable alignment, indicating the benefit is likely driven by generic regularization. A pilot cross-domain test with 100 phone photographs reveals a 58-69 percentage-point accuracy drop, exposing rope appearance bias as the dominant failure mode.
翻译:物理绳结分类是一种精细粒度视觉分类(FGVC)场景,其中外观线索被有意抑制:不同类别共享相同的绳索材质、颜色和背景,类别身份主要存在于交叉结构中。我们引入了Knots-10基准数据集,包含1,440张图像,采用面向实际部署的划分方式——使用松散绳结进行训练,并在紧密系紧的绳结上进行测试。Swin-T与TransFG平均准确率均达97.2%;PMG得分为94.5%,这与"拼图打乱会破坏交叉连续性"的假设一致。McNemar检验无法区分五个通用主干网络中的四个,因此小的排序差异应谨慎解读。Mantel置换检验表明,拓扑距离与五个模型中的三个(p < 0.01)的混淆模式显著相关。我们提出TACA正则化方法,将嵌入-拓扑对齐度从ρ=0.46提升至ρ=0.65,但未改善分类准确率;随机距离消融实验产生可比较的对齐度,表明该收益可能源于通用正则化效应。使用100张手机照片进行的跨领域先导测试显示准确率下降58-69个百分点,揭示绳索外观偏差是主要失效模式。