Reliable biomedical and clinical retrieval requires more than strong ranking performance: it requires a practical way to find systematic model failures and curate the training evidence needed to correct them. Late-interaction models such as ColBERT provide a first solution thanks to the interpretable token-level interaction scores they expose between document and query tokens. Yet this interpretability is shallow: it explains a particular document--query pairwise score, but does not reveal whether the model has learned a clinical concept in a stable, reusable, and context-sensitive way across diverse expressions. As a result, these scores provide limited support for diagnosing misunderstandings, identifying irreasonably distant biomedical concepts, or deciding what additional data or feedback is needed to address this. In this short position paper, we propose Diagnosable ColBERT, a framework that aligns ColBERT token embeddings to a reference latent space grounded in clinical knowledge and expert-provided conceptual similarity constraints. This alignment turns document encodings into inspectable evidence of what the model appears to understand, enabling more direct error diagnosis and more principled data curation without relying on large batteries of diagnostic queries.
翻译:可靠的生物医学与临床检索不仅需要强大的排序性能,更需要一种实用方法,用于发现系统性模型故障并整理出修正所需的关键训练证据。以ColBERT为代表的后期交互模型,凭借其暴露文档与查询标记之间可解释的标记级交互评分,提供了初步解决方案。然而这种可解释性是浅层的:它仅能解释特定文档-查询对的评分,却无法揭示模型是否以稳定、可复用且上下文敏感的方式跨不同表达形式学习到临床概念。因此,这些评分在诊断误解、识别语义上不合理的远距离生物医学概念,或决定解决上述问题所需补充的数据与反馈时,提供的支持十分有限。在本短篇立场论文中,我们提出"可诊断ColBERT"框架,该框架将ColBERT标记嵌入对齐至基于临床知识与专家提供的概念相似性约束构建的参考潜在空间。这种对齐使文档编码转化为可检查的模型认知证据,从而无需依赖大量诊断查询集,即可实现更直接的错误诊断与更系统的数据整理。