Amazon published its Frontier Model Safety Framework (FMSF) as part of the Paris AI summit, following which we presented a report on Amazon's Premier model. In this report, we present an evaluation of Nova 2.0 Lite. Nova 2.0 Lite was made generally available from amongst the Nova 2.0 series and is one of its most capable reasoning models. The model processes text, images, and video with a context length of up to 1M tokens, enabling analysis of large codebases, documents, and videos in a single prompt. We present a comprehensive evaluation of Nova 2.0 Lite's critical risk profile under the FMSF. Evaluations target three high-risk domains-Chemical, Biological, Radiological and Nuclear (CBRN), Offensive Cyber Operations, and Automated AI R&D-and combine automated benchmarks, expert red-teaming, and uplift studies to determine whether the model exceeds release thresholds. We summarize our methodology and report core findings. We will continue to enhance our safety evaluation and mitigation pipelines as new risks and capabilities associated with frontier models are identified.
翻译:亚马逊在巴黎人工智能峰会上发布了其前沿模型安全框架(FMSF),随后我们提交了一份关于亚马逊旗舰模型的报告。本报告呈现了对Nova 2.0 Lite模型的评估结果。Nova 2.0 Lite是从Nova 2.0系列中公开发布的版本,也是该系列中推理能力最强的模型之一。该模型能够处理文本、图像和视频输入,上下文长度高达100万token,支持通过单次提示分析大型代码库、文档和视频。我们在FMSF框架下对Nova 2.0 Lite的关键风险特征进行了全面评估。评估聚焦三个高风险领域——化学、生物、放射性与核武器(CBRN),进攻性网络行动,以及自动化人工智能研发——结合自动化基准测试、专家红队测试和提升研究,以判定模型是否超出发布阈值。我们总结了评估方法并报告了核心发现。随着前沿模型相关的新风险和能力被不断识别,我们将持续完善安全评估与风险缓解流程。