We develop a statistical procedure to detect lookahead bias in economic forecasts generated by large language models (LLMs). Using a date-only recall query for a firm-date pair, we estimate the probability that the LLM has internalized information about the realized outcome, a statistic we term Lookahead Propensity (LAP). LAP is materially positive throughout the in-sample period and collapses essentially to zero right after the training-data cutoff. We show that a positive interaction between LAP and the LLM forecast in an accuracy regression indicates lookahead-bias contamination, and apply the test to two forecasting tasks: news headlines predicting stock returns and earnings call transcripts predicting capital expenditures. In both applications, the LLM forecast's predictive power is amplified on high-LAP firm-date pairs, and the interaction loses significance on post-training-cutoff samples. Our test provides a cost-efficient, diagnostic tool for assessing the validity and reliability of LLM-generated forecasts.
翻译:我们提出一种统计方法来检测大语言模型(LLM)生成的经济预测中是否存在前视偏差。通过使用仅包含日期的公司-日期对查询,我们估计LLM已内化实现结果信息的概率——这一统计量被称为前视倾向性(LAP)。LAP在样本期内显著为正,并在训练数据截止点后基本降至零。我们证明,在准确性回归中,LAP与LLM预测之间的正交互作用表明存在前视偏差污染,并将该检验应用于两项预测任务:新闻标题预测股票收益与盈利电话会议记录预测资本支出。在这两项应用中,LLM预测的解释力在高LAP的公司-日期对上被放大,且该交互作用在训练截止点后的样本中失去统计显著性。我们的检验为评估LLM生成预测的有效性与可靠性提供了一种经济高效的诊断工具。