Machine learning models increasingly generate their own training data -- online bandits, reinforcement learning, and post-training pipelines for language models are leading examples. In these adaptive settings, a single training observation both updates the learner and shifts the distribution of future data the learner will collect. Standard attribution methods, designed for static datasets, ignore this feedback. We formalize occurrence-level attribution for finite-horizon adaptive learning via a conditional interventional target, prove that replay-side information cannot recover it in general, and identify a structural class in which the target is identified from logged data.
翻译:机器学习模型越来越多地自行生成训练数据——在线臂架算法、强化学习以及语言模型的后训练流程是典型的例子。在这些自适应场景中,单个训练观察值既更新了学习器,也改变了学习器未来将采集的数据分布。为静态数据集设计的标准归因方法忽略了这种反馈。我们通过条件干预目标形式化有限时域自适应学习中的出现级归因,证明重放侧信息通常无法恢复该目标,并识别出一个结构类,在该类别中可以从记录数据中识别该目标。