We introduce a family of chronologically consistent, instruction-following large language models to eliminate lookahead bias. Each model is trained only on data available before a clearly defined knowledge-cutoff date, ensuring strict temporal separation from any post-cutoff data. The resulting framework offers (i) a simple, conversational chat interface, (ii) fully open, fixed model weights that guarantee replicability, and (iii) a conservative lower bound on forecast accuracy, isolating the share of predictability that survives once training leakage is removed. Together, these features provide researchers with an easy-to-use generative AI tool useful for a wide range of prediction tasks that is free of lookahead bias.
翻译:本文提出了一系列具有时序一致性且遵循指令的大型语言模型,旨在消除前瞻偏差。每个模型仅使用明确定义的知识截止日期之前可获得的数据进行训练,确保与截止日期后数据的严格时间分离。该框架具备以下特征:(i) 简洁的对话式聊天界面,(ii) 完全开源且固定的模型权重以保证可复现性,(iii) 对预测准确性的保守下界估计,从而在排除训练数据泄露影响后,分离出持续存在的可预测性部分。这些特性共同为研究人员提供了一个易于使用的生成式人工智能工具,适用于广泛的预测任务,且完全避免了前瞻偏差。