Vision Language Models (VLMs) achieve strong performance on multimodal tasks but still suffer from hallucination and safety-related failures that persist even at scale. Steering offers a lightweight technique to improve model performance. However, steering, whether input-dependent or input-independent, achieves a meaningful trade-off between efficiency and effectiveness. In this work, we observe that steering vectors can generalize across inputs when tasks share aligned semantic intent. Based on this insight, we propose \textbf{OSGA} (\textbf{O}ne-shot \textbf{S}teering with \textbf{G}enerative \textbf{A}nchor), an input-independent framework that improves model performance with a single optimization instance. OSGA first selects an informative sample via a variance-based data selection strategy and learns a single steering vector with a contrastive objective with generative anchor regularization. The resulting vector can be universally applied at a certain layer during inference time without modifying model parameters. Experiments across multiple benchmarks show that a single OSGA-optimized steering vector consistently improves hallucination mitigation and safety enhancement with negligible overhead, highlighting one-shot steering as a practical and scalable solution for reliable VLMs.
翻译:视觉语言模型(VLMs)在多模态任务上表现出色,但仍受幻觉和安全性相关故障的困扰,这些问题即使在模型规模扩大后依然存在。导向技术提供了一种轻量级方法来提升模型性能。然而,无论是输入依赖型还是输入独立型的导向方法,都在效率与效果之间实现了有意义的平衡。本研究中,我们观察到当任务共享一致的语义意图时,导向向量能够在不同输入间泛化。基于这一发现,我们提出了 \textbf{OSGA}(\textbf{O}ne-shot \textbf{S}teering with \textbf{G}enerative \textbf{A}nchor),一种输入独立的框架,仅通过单次优化实例即可提升模型性能。OSGA 首先通过基于方差的数据选择策略选取信息丰富的样本,并利用带有生成锚点正则化的对比目标学习单个导向向量。所得向量可在推理时统一应用于特定层,而无需修改模型参数。在多个基准测试上的实验表明,单个经 OSGA 优化的导向向量能够持续改善幻觉缓解和安全性增强,且开销可忽略不计,这凸显了单次导向作为一种实用且可扩展的解决方案,可用于构建可靠的视觉语言模型。