NewsBench: Systematic Evaluation of LLMs for Writing Proficiency and Safety Adherence in Chinese Journalistic Editorial Applications

This study presents NewsBench, a novel benchmark framework developed to evaluate the capability of Large Language Models (LLMs) in Chinese Journalistic Writing Proficiency (JWP) and their Safety Adherence (SA), addressing the gap between journalistic ethics and the risks associated with AI utilization. Comprising 1,267 tasks across 5 editorial applications, 7 aspects (including safety and journalistic writing with 4 detailed facets), and spanning 24 news topics domains, NewsBench employs two GPT-4 based automatic evaluation protocols validated by human assessment. Our comprehensive analysis of 10 LLMs highlighted GPT-4 and ERNIE Bot as top performers, yet revealed a relative deficiency in journalistic ethic adherence during creative writing tasks. These findings underscore the need for enhanced ethical guidance in AI-generated journalistic content, marking a step forward in aligning AI capabilities with journalistic standards and safety considerations.

翻译：本研究提出新闻基准（NewsBench），这是一个新型基准框架，用于评估大语言模型（LLMs）在中文新闻写作能力（JWP）及其安全遵循性（SA）方面的表现，旨在填补新闻伦理与人工智能应用风险之间的鸿沟。该框架涵盖5类编辑应用、7个评估维度（包括安全性与新闻写作下4个细项），覆盖24个新闻主题领域，采用基于GPT-4的两种自动评估协议，并经人工评估验证。通过对10个大语言模型的全面分析，GPT-4与文心一言表现最佳，但在创造性写作任务中相对缺乏对新闻伦理的遵循。这些发现凸显了需要加强对AI生成新闻内容的伦理指导，标志着在使AI能力与新闻标准及安全考量对齐方面迈出了重要一步。

相关内容

GPT-4

关注 29

北京时间2023年3月15日凌晨，ChatGPT开发商OpenAI 发布了发布了全新的多模态预训练大模型 GPT-4，可以更可靠、更具创造力、能处理更细节的指令，根据图片和文字提示都能生成相应内容。具体来说来说，GPT-4 相比上一代的模型，实现了飞跃式提升：支持图像和文本输入，拥有强大的识图能力；大幅提升了文字输入限制，在ChatGPT模式下，GPT-4可以处理超过2.5万字的文本，可以处理一些更加细节的指令；回答准确性也得到了显著提高。

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

67页PPT【ML+气象】使用机器学习技术对季节和次季节研究和预测，Use of Machine Learning Techniques for Seasonal and Subseasonal Studies and Predictions

专知会员服务

19+阅读 · 2022年3月4日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日