Language models are often evaluated with scalar metrics like accuracy, but such measures fail to capture how models internally represent ambiguity, especially when human annotators disagree. We propose a topological perspective to analyze how fine-tuned models encode ambiguity and more generally instances. Applied to RoBERTa-Large on the MD-Offense dataset, Mapper, a tool from topological data analysis, reveals that fine-tuning restructures embedding space into modular, non-convex regions aligned with model predictions, even for highly ambiguous cases. Over $98\%$ of connected components exhibit $\geq 90\%$ prediction purity, yet alignment with ground-truth labels drops in ambiguous data, surfacing a hidden tension between structural confidence and label uncertainty. Unlike traditional tools such as PCA or UMAP, Mapper captures this geometry directly uncovering decision regions, boundary collapses, and overconfident clusters. Our findings position Mapper as a powerful diagnostic tool for understanding how models resolve ambiguity. Beyond visualization, it also enables topological metrics that may inform proactive modeling strategies in subjective NLP tasks.
翻译:语言模型常通过准确率等标量指标进行评估,但此类度量无法捕捉模型内部如何表征歧义性,尤其在人类标注者存在分歧时。我们提出一种拓扑学视角来分析微调模型如何编码歧义性及更一般的实例。将拓扑数据分析工具Mapper应用于MD-Offense数据集上的RoBERTa-Large模型,结果显示:微调将嵌入空间重构为与模型预测对齐的模块化非凸区域,即使对于高度歧义的案例亦如此。超过$98\%$的连通分支展现出$\geq 90\%$的预测纯度,但在歧义数据中其与真实标签的对齐度下降,揭示了结构置信度与标签不确定性之间的隐性张力。与PCA或UMAP等传统工具不同,Mapper能直接捕捉这种几何结构,揭示决策区域、边界坍缩及过度自信的聚类。我们的研究将Mapper定位为理解模型如何消解歧义性的强大诊断工具。除可视化功能外,它还可提供拓扑度量指标,为具有主观性的NLP任务中的前瞻性建模策略提供参考。