News organizations today rely on AI tools to increase efficiency and productivity across various tasks in news production and distribution. These tools are oriented towards stakeholders such as reporters, editors, and readers. However, practitioners also express reservations around adopting AI technologies into the newsroom, due to the technical and ethical challenges involved in evaluating AI technology and its return on investments. This is to some extent a result of the lack of domain-specific strategies to evaluate AI models and applications. In this paper, we consider different aspects of AI evaluation (model outputs, interaction, and ethics) that can benefit from domain-specific tailoring, and suggest examples of how journalistic considerations can lead to specialized metrics or strategies. In doing so, we lay out a potential framework to guide AI evaluation in journalism, such as seen in other disciplines (e.g. law, healthcare). We also consider directions for future work, as well as how our approach might generalize to other domains.
翻译:当下,新闻机构依赖各类AI工具以提高新闻制作与分发全流程的效率及生产力。这些工具面向记者、编辑、读者等利益相关者。然而,由于评估AI技术及其投资回报涉及技术与伦理双重挑战,从业人员对在新闻编辑室中采用AI技术仍持保留态度。某种程度上,这源于缺乏针对AI模型与应用进行领域特定评估的策略。本文从AI评估的不同维度(模型输出、交互机制与伦理准则)切入,探讨可通过领域特化设计获得改进的环节,并举例说明如何将新闻业考量转化为专属度量指标或评估策略。基于此,我们构建了一个潜在框架,旨在引导新闻领域的AI评估实践——正如法律、医疗等其他学科领域已有的成熟范式。最后,我们探讨了未来研究方向,并分析了该框架向其他领域推广的可行性。