Recent breakthroughs in natural language processing (NLP) have permitted the synthesis and comprehension of coherent text in an open-ended way, therefore translating the theoretical algorithms into practical applications. The large language models (LLMs) have significantly impacted businesses such as report summarization software and copywriters. Observations indicate, however, that LLMs may exhibit social prejudice and toxicity, posing ethical and societal dangers of consequences resulting from irresponsibility. Large-scale benchmarks for accountable LLMs should consequently be developed. Although several empirical investigations reveal the existence of a few ethical difficulties in advanced LLMs, there is little systematic examination and user study of the risks and harmful behaviors of current LLM usage. To further educate future efforts on constructing ethical LLMs responsibly, we perform a qualitative research method called ``red teaming'' on OpenAI's ChatGPT\footnote{In this paper, ChatGPT refers to the version released on Dec 15th.} to better understand the practical features of ethical dangers in recent LLMs. We analyze ChatGPT comprehensively from four perspectives: 1) \textit{Bias} 2) \textit{Reliability} 3) \textit{Robustness} 4) \textit{Toxicity}. In accordance with our stated viewpoints, we empirically benchmark ChatGPT on multiple sample datasets. We find that a significant number of ethical risks cannot be addressed by existing benchmarks, and hence illustrate them via additional case studies. In addition, we examine the implications of our findings on AI ethics and harmal behaviors of ChatGPT, as well as future problems and practical design considerations for responsible LLMs. We believe that our findings may give light on future efforts to determine and mitigate the ethical hazards posed by machines in LLM applications.
翻译:自然语言处理(NLP)领域的最新突破使得生成和理解连贯文本的开放式方法成为可能,从而将理论算法转化为实际应用。大型语言模型(LLMs)已显著影响了报告摘要软件和文案撰写等行业。然而,观察表明,LLMs可能表现出社会偏见和毒性,由此引发不负责任行为带来的伦理与社会风险。因此,亟需开发面向负责任LLMs的大规模基准测试。尽管若干实证研究揭示了高级LLMs中存在部分伦理问题,但目前对当前LLM使用中的风险及有害行为缺乏系统性考察与用户研究。为进一步指导未来构建负责任伦理LLMs的研究,我们采用名为“红队测试”(red teaming)的定性研究方法,针对OpenAI的ChatGPT(本文中指2023年12月15日发布的版本)展开分析,以深入理解近期LLMs中伦理风险的实际特征。我们从四个维度对ChatGPT进行全面评估:1)偏见(Bias)、2)可靠性(Reliability)、3)鲁棒性(Robustness)、4)毒性(Toxicity)。基于上述视角,我们在多个样本数据集上对ChatGPT进行实证基准测试。研究发现,大量伦理风险无法通过现有基准测试覆盖,因此我们通过额外案例研究加以阐释。此外,我们探讨了研究结果对人工智能伦理与ChatGPT有害行为的启示,以及负责任LLMs面临的未来挑战与实践设计考量。我们相信,这些发现可为后续识别并缓解LLM应用中机器引发的伦理风险提供参考。