Large Language Models (LLMs) such as ChatGPT, have gained significant attention due to their impressive natural language processing capabilities. It is crucial to prioritize human-centered principles when utilizing these models. Safeguarding the ethical and moral compliance of LLMs is of utmost importance. However, individual ethical issues have not been well studied on the latest LLMs. Therefore, this study aims to address these gaps by introducing a new benchmark -- TrustGPT. TrustGPT provides a comprehensive evaluation of LLMs in three crucial areas: toxicity, bias, and value-alignment. Initially, TrustGPT examines toxicity in language models by employing toxic prompt templates derived from social norms. It then quantifies the extent of bias in models by measuring quantifiable toxicity values across different groups. Lastly, TrustGPT assesses the value of conversation generation models from both active value-alignment and passive value-alignment tasks. Through the implementation of TrustGPT, this research aims to enhance our understanding of the performance of conversation generation models and promote the development of language models that are more ethical and socially responsible.
翻译:诸如ChatGPT等大语言模型因其出色的自然语言处理能力而备受关注。在应用这类模型时,坚持以人为本的原则至关重要,确保大语言模型的伦理与道德合规性具有首要意义。然而,针对最新大语言模型的个体伦理问题尚未得到充分研究。本研究旨在弥补这一空白,提出全新基准——TrustGPT。TrustGPT从三个关键维度对大语言模型进行全面评估:毒性、偏见与价值对齐。首先,TrustGPT通过基于社会规范设计的毒性提示模板检测语言模型中的毒性内容。其次,通过量化不同群体间可测度的毒性值来评估模型的偏见程度。最后,TrustGPT从主动价值对齐与被动价值对齐两个任务维度评估对话生成模型的价值取向。通过TrustGPT的实施,本研究旨在增进对对话生成模型性能的理解,并推动更符合伦理规范、更具社会责任感的语言模型开发。