As Large Language Models (LLMs) increasingly power applications used by children and adolescents, ensuring safe and age-appropriate interactions has become an urgent ethical imperative. Despite progress in AI safety, current evaluations predominantly focus on adults, neglecting the unique vulnerabilities of minors engaging with generative AI. We introduce Safe-Child-LLM, a comprehensive benchmark and dataset for systematically assessing LLM safety across two developmental stages: children (7-12) and adolescents (13-17). Our framework includes a novel multi-part dataset of 200 adversarial prompts, curated from red-teaming corpora (e.g., SG-Bench, HarmBench), with human-annotated labels for jailbreak success and a standardized 0-5 ethical refusal scale. Evaluating leading LLMs -- including ChatGPT, Claude, Gemini, LLaMA, DeepSeek, Grok, Vicuna, and Mistral -- we uncover critical safety deficiencies in child-facing scenarios. This work highlights the need for community-driven benchmarks to protect young users in LLM interactions. To promote transparency and collaborative advancement in ethical AI development, we are publicly releasing both our benchmark datasets and evaluation codebase at https://github.com/The-Responsible-AI-Initiative/Safe_Child_LLM_Benchmark.git
翻译:随着大型语言模型(LLM)日益驱动儿童和青少年使用的应用,确保安全且适龄的交互已成为一项紧迫的伦理要求。尽管人工智能安全取得进展,当前评估主要聚焦于成人,忽视了未成年人接触生成式人工智能时的独特脆弱性。我们提出Safe-Child-LLM,一个综合性基准和数据集,用于系统评估LLM在两个发展阶段的安全性:儿童(7–12岁)和青少年(13–17岁)。我们的框架包含一个新颖的多部分数据集,涵盖200个对抗性提示,这些提示选自红队测试语料库(例如SG-Bench、HarmBench),并附带人工标注的越狱成功标签以及标准化0–5伦理拒绝量表。通过评估主流LLM——包括ChatGPT、Claude、Gemini、LLaMA、DeepSeek、Grok、Vicuna和Mistral——我们揭示了在面向儿童场景中的关键安全缺陷。本工作强调了以社区驱动的基准来保护LLM交互中年轻用户的必要性。为促进透明度和伦理AI开发的协作进步,我们已在https://github.com/The-Responsible-AI-Initiative/Safe_Child_LLM_Benchmark.git 公开提供我们的基准数据集和评估代码库。