Supporting Anticipatory Governance using LLMs: Evaluating and Aligning Large Language Models with the News Media to Anticipate the Negative Impacts of AI

Taxonomy · MoDELS · 语言模型化 · 值域 · 大语言模型 ·

2024 年 1 月 31 日

翻译：支持使用大语言模型的预应治理：评估并对齐大语言模型与新闻媒体以预测人工智能的负面冲击

Mowafak Allaham,Nicholas Diakopoulos

from arxiv, 14 pages + research ethics and social impact statement, references, and appendix. Under conference review

Anticipating the negative impacts of emerging AI technologies is a challenge, especially in the early stages of development. An understudied approach to such anticipation is the use of LLMs to enhance and guide this process. Despite advancements in LLMs and evaluation metrics to account for biases in generated text, it is unclear how well these models perform in anticipatory tasks. Specifically, the use of LLMs to anticipate AI impacts raises questions about the quality and range of categories of negative impacts these models are capable of generating. In this paper we leverage news media, a diverse data source that is rich with normative assessments of emerging technologies, to formulate a taxonomy of impacts to act as a baseline for comparing against. By computationally analyzing thousands of news articles published by hundreds of online news domains around the world, we develop a taxonomy consisting of ten categories of AI impacts. We then evaluate both instruction-based (GPT-4 and Mistral-7B-Instruct) and fine-tuned completion models (Mistral-7B and GPT-3) using a sample from this baseline. We find that the generated impacts using Mistral-7B, fine-tuned on impacts from the news media, tend to be qualitatively on par with impacts generated using a larger scale model such as GPT-4. Moreover, we find that these LLMs generate impacts that largely reflect the taxonomy of negative impacts identified in the news media, however the impacts produced by instruction-based models had gaps in the production of certain categories of impacts in comparison to fine-tuned models. This research highlights a potential bias in state-of-the-art LLMs when used for anticipating impacts and demonstrates the advantages of aligning smaller LLMs with a diverse range of impacts, such as those reflected in the news media, to better reflect such impacts during anticipatory exercises.

翻译：预测新兴人工智能技术的负面冲击是一项挑战，尤其是在技术发展的早期阶段。利用大语言模型来增强和引导这一预测过程，是一个尚未充分研究的路径。尽管大语言模型在生成文本偏差的评估指标方面取得了进展，但尚不清楚这些模型在预测任务中的表现。具体而言，使用大语言模型预测AI冲击引发了关于其生成负面冲击类别质量与范围的问题。本文利用新闻媒体这一富含新兴技术规范性评估的多样化数据源，构建了一个冲击分类法作为比较基准。通过对全球数百个在线新闻领域发布的数千篇新闻文章进行计算分析，我们开发了一个包含十类AI冲击的分类法。随后，我们使用该基准样本评估了基于指令的模型（GPT-4和Mistral-7B-Instruct）和微调补全模型（Mistral-7B和GPT-3）。结果发现，使用经新闻媒体冲击数据微调的Mistral-7B生成的冲击，在质量上与GPT-4等更大规模模型生成的冲击相当。此外，这些大语言模型生成的冲击在很大程度上反映了新闻媒体中识别的负面冲击分类法，但与微调模型相比，基于指令的模型在某些冲击类别生成上存在缺口。本研究揭示了当前先进大语言模型在预测冲击时可能存在的偏差，并展示了将较小模型与新闻媒体反映的多样化冲击对齐的优势，从而在预演练习中更好地体现此类冲击。