We explore the capability of transformers to address endogeneity in in-context linear regression. Our main finding is that transformers inherently possess a mechanism to handle endogeneity effectively using instrumental variables (IV). First, we demonstrate that the transformer architecture can emulate a gradient-based bi-level optimization procedure that converges to the widely used two-stage least squares $(\textsf{2SLS})$ solution at an exponential rate. Next, we propose an in-context pretraining scheme and provide theoretical guarantees showing that the global minimizer of the pre-training loss achieves a small excess loss. Our extensive experiments validate these theoretical findings, showing that the trained transformer provides more robust and reliable in-context predictions and coefficient estimates than the $\textsf{2SLS}$ method, in the presence of endogeneity.
翻译:本文探讨了Transformer模型处理上下文线性回归中内生性问题的能力。我们的主要发现是,Transformer本质上具备利用工具变量有效处理内生性的机制。首先,我们证明Transformer架构能够模拟基于梯度的双层优化过程,该过程以指数速率收敛至广泛使用的两阶段最小二乘法解。其次,我们提出一种上下文预训练方案,并从理论上证明预训练损失的全局最小化器能够实现较小的超额损失。大量实验验证了这些理论发现,表明在存在内生性的情况下,经过训练的Transformer能够提供比两阶段最小二乘法更稳健可靠的上下文预测和系数估计。