While dense biomedical embeddings achieve strong performance, their black-box nature limits their utility in clinical decision-making. Recent question-based interpretable embeddings represent text as binary answers to natural-language questions, but these approaches often rely on heuristic or surface-level contrastive signals and overlook specialized domain knowledge. We propose QIME, an ontology-grounded framework for constructing interpretable medical text embeddings in which each dimension corresponds to a clinically meaningful yes/no question. By conditioning on cluster-specific medical concept signatures, QIME generates semantically atomic questions that capture fine-grained distinctions in biomedical text. Furthermore, QIME supports a training-free embedding construction strategy that eliminates per-question classifier training while further improving performance. Experiments across biomedical semantic similarity, clustering, and retrieval benchmarks show that QIME consistently outperforms prior interpretable embedding methods and substantially narrows the gap to strong black-box biomedical encoders, while providing concise and clinically informative explanations.
翻译:尽管密集生物医学嵌入模型在性能上表现优异,但其黑箱特性限制了其在临床决策中的应用价值。近期基于问题的可解释嵌入方法将文本表示为对自然语言问题的二元回答,但这些方法通常依赖启发式或浅层对比信号,且忽视了专业领域知识。我们提出QIME——一个基于本体论的可解释医学文本嵌入构建框架,其中每个维度对应一个具有临床意义的“是/否”问题。通过以特定聚类医学概念签名为条件,QIME能够生成语义原子化的问题,从而捕捉生物医学文本中的细粒度差异。此外,QIME支持免训练的嵌入构建策略,在无需逐问题训练分类器的同时进一步提升了性能。在生物医学语义相似性、聚类和检索基准测试上的实验表明,QIME始终优于先前的可解释嵌入方法,并显著缩小了与强性能黑箱生物医学编码器的性能差距,同时提供简洁且具有临床信息价值的解释。