Model collapse, the degradation in performance that arises when generative models are trained on the outputs of prior models, is an increasing concern as artificially generated content proliferates. Related critiques of large language models have highlighted their tendency to reproduce frequent patterns in training data, their reliance on vast datasets, and their substantial environmental cost. Together, these factors contribute to data degradation, the reinforcement of cultural biases, and inefficient resource use. In this position paper we aim to combine these views and argue that model collapse threatens current efforts to democratize AI. By reducing training efficiency and skewing data distributions away from the tails of their support, model collapse disproportionately impacts low-resource and marginalized communities. We examine both the environmental and cultural implications of this phenomenon, situate our position within recent position papers on model collapse, and conclude with a call to action. Finally, we outline initial directions for mitigating these effects.
翻译:模型崩溃是指生成模型在先验模型输出上训练时出现的性能退化现象,随着人工生成内容的激增,这一问题日益受到关注。对大型语言模型的相关批评已强调其倾向于重复训练数据中的高频模式、依赖海量数据集以及产生巨大环境成本。这些因素共同导致数据退化、文化偏见强化及资源利用效率低下。本立场论文旨在整合上述观点,论证模型崩溃正在威胁当前人工智能民主化的努力。通过降低训练效率并使数据分布偏离支撑集的尾部,模型崩溃对低资源及边缘化社区造成不成比例的影响。我们探讨了该现象的环境与文化影响,将我们的立场置于近期关于模型崩溃的立场论文框架内,并最终提出行动倡议。最后,我们概述了缓解这些影响的初步方向。