The rapid advancement of large language models (LLMs) has led to growing interest in using synthetic data to train future models. However, this creates a self-consuming retraining loop, where models are trained on their own outputs and may cause performance drops and induce emerging biases. In real-world applications, previously deployed LLMs may influence the data they generate, leading to a dynamic system driven by user feedback. For example, if a model continues to underserve users from a group, less query data will be collected from this particular demographic of users. In this study, we introduce the concept of \textbf{S}elf-\textbf{C}onsuming \textbf{P}erformative \textbf{L}oop (\textbf{SCPL}) and investigate the role of synthetic data in shaping bias during these dynamic iterative training processes under controlled performative feedback. This controlled setting is motivated by the inaccessibility of real-world user preference data from dynamic production systems, and enables us to isolate and analyze feedback-driven bias evolution in a principled manner. We focus on two types of loops, including the typical retraining setting and the incremental fine-tuning setting, which is largely underexplored. Through experiments on three real-world tasks, we find that the performative loop increases preference bias and decreases disparate bias. We design a reward-based rejection sampling strategy to mitigate the bias, moving towards more trustworthy self-improving systems.
翻译:大语言模型的快速发展促使人们日益关注使用合成数据训练未来模型。然而,这形成了一种自消耗的再训练循环——模型基于自身输出进行训练,可能导致性能下降并诱发新兴偏差。在实际应用中,先前部署的大语言模型可能影响其生成的数据,从而形成由用户反馈驱动的动态系统。例如,若模型持续对某一用户群体服务不足,从该特定人口统计特征用户收集的查询数据将随之减少。本研究引入**自消耗表现循环**概念,并在受控表现反馈条件下,探究合成数据在这些动态迭代训练过程中对偏差形成的作用。这一受控设置源于动态生产系统中真实用户偏好数据的不可获取性,使我们能够以系统化方式分离并分析反馈驱动的偏差演化。我们重点关注两种循环类型:典型的再训练设置与尚未被充分探索的增量微调设置。通过在三个实际任务上的实验,我们发现表现循环会增强偏好偏差并降低差异偏差。我们设计了一种基于奖励的拒绝采样策略以缓解此类偏差,推动自改进系统向更可信的方向发展。