Causal Transformers model sequences through an autoregressive factorization of the joint distribution, which enables efficient left-to-right decoding and conditional likelihood computation. However, they cannot tractably sample from or evaluate arbitrary conditionals -- e.g., a block of text conditioned on past and future tokens. Recent work aims to solve this problem through novel architectures, but they often lead to sub-optimal modeling of such conditionals and degraded generations. We propose Arbitrary Conditionals GPT (AC-GPT) which introduces a simple modification to standard causal Transformers to enable evaluating and sampling from arbitrary conditionals -- including past, future, and mixed contexts -- within a single forward pass. Unlike prior approaches, our method preserves the standard left-to-right ordering and next-token prediction objective essential for both strong performance and efficient training on natural language. Crucially, this compatibility allows existing LLMs to be fine-tuned for arbitrary conditioning. Our empirical results indicate that our method outperforms baselines on modeling arbitrary conditionals, without degrading standard left-to-right performance.
翻译:因果Transformer通过自回归分解联合分布来建模序列,这使得从左到右的高效解码和条件似然计算成为可能。然而,它们无法高效地从任意条件(例如,基于过去和未来标记的文本块)中进行采样或评估。近期工作试图通过新颖架构解决该问题,但往往导致此类条件的次优建模及生成质量下降。我们提出任意条件GPT(AC-GPT),该模型引入对标准因果Transformer的简单修改,使其能够在单次前向传播中评估并采样任意条件——包括过去、未来及混合上下文。与先前方法不同,我们的方法保留了标准从左到右的序列顺序和下一词预测目标,这两者对自然语言的强性能和高效率训练至关重要。关键在于,这种兼容性使得现有大型语言模型能够微调以处理任意条件。实验结果表明,我们的方法在建模任意条件方面优于基线模型,且不损害标准从左到右任务的性能。