Inference-Time-Compute: More Faithful? A Research Note

Models trained specifically to generate long Chains of Thought (CoTs) have recently achieved impressive results. We refer to these models as Inference-Time-Compute (ITC) models. Are the CoTs of ITC models more faithful compared to traditional non-ITC models? We evaluate two ITC models (based on Qwen-2.5 and Gemini-2) on an existing test of faithful CoT To measure faithfulness, we test if models articulate cues in their prompt that influence their answers to MMLU questions. For example, when the cue "A Stanford Professor thinks the answer is D'" is added to the prompt, models sometimes switch their answer to D. In such cases, the Gemini ITC model articulates the cue 54% of the time, compared to 14% for the non-ITC Gemini. We evaluate 7 types of cue, such as misleading few-shot examples and anchoring on past responses. ITC models articulate cues that influence them much more reliably than all the 6 non-ITC models tested, such as Claude-3.5-Sonnet and GPT-4o, which often articulate close to 0% of the time. However, our study has important limitations. We evaluate only two ITC models -- we cannot evaluate OpenAI's SOTA o1 model. We also lack details about the training of these ITC models, making it hard to attribute our findings to specific processes. We think faithfulness of CoT is an important property for AI Safety. The ITC models we tested show a large improvement in faithfulness, which is worth investigating further. To speed up this investigation, we release these early results as a research note.

翻译：专门训练用于生成长链思维（CoT）的模型近期取得了令人瞩目的成果。我们将这类模型称为推理时计算（ITC）模型。与传统非ITC模型相比，ITC模型的CoT是否更具可信性？我们在现有的可信CoT测试集上评估了两个ITC模型（基于Qwen-2.5和Gemini-2）。为衡量可信度，我们测试模型是否能在其提示中阐明影响其回答MMLU问题的线索。例如，当提示中加入线索"斯坦福大学教授认为答案是D"时，模型有时会将其答案切换为D。在此类情况下，Gemini ITC模型阐明该线索的比例为54%，而非ITC Gemini模型仅为14%。我们评估了7类线索，例如误导性少样本示例和对过往回答的锚定效应。ITC模型阐明影响其决策线索的可靠性远高于所有6个测试的非ITC模型（如Claude-3.5-Sonnet和GPT-4o），后者阐明线索的比例常接近0%。然而，本研究存在重要局限：我们仅评估了两个ITC模型——无法评估OpenAI的SOTA o1模型；同时缺乏这些ITC模型训练细节，难以将发现归因于特定训练过程。我们认为CoT的可信性是AI安全的重要属性。测试的ITC模型在可信度方面展现出显著提升，值得深入研究。为加速相关研究，我们将这些初步成果以研究笔记形式发布。

相关内容

MoDELS

关注 46

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日