We present our participation in the SOMD 2026 shared task on cross-document software mention coreference resolution, where our systems ranked second across all three subtasks. We compare two fine-tuning-free approaches: Fuzzy Matching (FM), a lexical string-similarity method, and Context Aware Representations (CAR), which combines mention-level and document-level embeddings. Both achieve competitive performance across all subtasks (CoNLL F1 of 0.94-0.96), with CAR consistently outperforming FM by 1 point on the official test set, consistent with the high surface regularity of software names, which reduces the need for complex semantic reasoning. A controlled noise-injection study reveals complementary failure modes: as boundary noise increases, CAR loses only 0.07 F1 points from clean to fully corrupted input, compared to 0.20 for FM, whereas under mention substitution, FM degrades more gracefully (0.52 vs. 0.63). Our inference-time analysis shows that FM scales superlinearly with corpus size, whereas CAR scales approximately linearly, making CAR the more efficient choice at large scale. These findings suggest that system selection should be informed by both the noise profile of the upstream mention detector and the scale of the target corpus. We release our code to support future work on this underexplored task.
翻译:我们报告了在SOMD 2026跨文档软件提及共指消解共享任务中的参与工作,我们的系统在三个子任务中均排名第二。我们比较了两种免微调方法:模糊匹配(FM,一种基于字符串相似度的词汇方法)与上下文感知表征(CAR,该方法融合了提及级与文档级嵌入)。两种方法在所有子任务中均取得了具有竞争力的性能(CoNLL F1值0.94-0.96),CAR在官方测试集上始终以1个百分点的优势优于FM,这与软件名称的高度表而规则性一致,降低了复杂语义推理的需求。通过受控噪声注入实验揭示了互补性失效模式:当边界噪声增加时,CAR从纯净输入到完全损坏输入仅损失0.07 F1点(FM为0.20),而在提及替换条件下,FM退化更平缓(0.52对0.63)。推理时间分析表明,FM的计算复杂度随语料库规模呈超线性增长,而CAR近似线性增长,使CAR在大规模场景下更具效率。这些发现表明:系统选择应兼顾上游提及检测器的噪声特征与目标语料库规模。我们已开源代码以支持这一尚未充分探索的任务。