Recent advances in large language models (LLMs) have produced many specialized multimodal LLMs (MLLMs) that share common foundational LLMs, forming distinct model lineages. It remains unclear whether a fundamental behavioral link exists between the foundational LLMs and downstream variants. We investigate this question by quantifying head-level context-truthfulness scores. Across diverse LLM and MLLM lineages, including Vicuna-, Qwen2.5-, LLaMA2-, and Mistral-based models, we find that Truth Scores are strongly preserved within model families, even after instruction tuning or multimodal adaptation. We further show that this inheritance is consistent with attention-head weight preservation, and that context-truthful heads attend to query-relevant evidence. Building on this finding, we propose TruthProbe, a soft-gating strategy that amplifies context-truthful heads while preserving other head contributions. TruthProbe improves contextual truthfulness on HaluEval and reduces multimodal hallucination on POPE and CHAIR, with base-LLM Truth Scores transferring effectively to their fine-tuned LLM and MLLM descendants. Code is available at https://github.com/miso-choi/TruthProbe.
翻译:近年来大语言模型(LLMs)的进展催生了众多共享基础LLM的专用多模态大模型(MLLMs),形成了独特的模型谱系。目前尚不清楚基础LLM与下游变体间是否存在基础行为关联。我们通过量化头部级别的上下文真实性分数对此展开研究。在涵盖Vicuna、Qwen2.5、LLaMA2和Mistral等基座模型的多样化LLM与MLLM谱系中,我们发现真实性分数在模型家族内具有强继承性,即便经过指令微调或多模态适配后仍保持稳定。进一步研究表明,这种继承性与注意力头权重保留机制一致,且上下文真实性头部会关注与查询相关的证据。基于此发现,我们提出TruthProbe软门控策略,该策略在保留其他头部贡献的同时增强上下文真实性头部。TruthProbe在HaluEval上提升了上下文真实性水平,并在POPE与CHAIR基准上减少多模态幻觉,基座LLM的真实性分数可有效迁移至其微调后的LLM与MLLM后代。代码已开源:https://github.com/miso-choi/TruthProbe。