Evaluating the Effectiveness of OpenAI's Parental Control System

We evaluate how effectively platform-level parental controls moderate a mainstream conversational assistant used by minors. Our two-phase protocol first builds a category-balanced conversation corpus via PAIR-style iterative prompt refinement over API, then has trained human agents replay/refine those prompts in the consumer UI using a designated child account while monitoring the linked parent inbox for alerts. We focus on seven risk areas -- physical harm, pornography, privacy violence, health consultation, fraud, hate speech, and malware and quantify four outcomes: Notification Rate (NR), Leak-Through (LR), Overblocking (OBR), and UI Intervention Rate (UIR). Using an automated judge (with targeted human audit) and comparing the current backend to legacy variants (GPT-4.1/4o), we find that notifications are selective rather than comprehensive: privacy violence, fraud, hate speech, and malware triggered no parental alerts in our runs, whereas physical harm (highest), pornography, and some health queries produced intermittent alerts. The current backend shows lower leak-through than legacy models, yet overblocking of benign, educational queries near sensitive topics remains common and is not surfaced to parents, revealing a policy-product gap between on-screen safeguards and parent-facing telemetry. We propose actionable fixes: broaden/configure the notification taxonomy, couple visible safeguards to privacy-preserving parent summaries, and prefer calibrated, age-appropriate safe rewrites over blanket refusals.

翻译：本研究评估了平台级家长控制对未成年人使用的主流对话助手的调节效果。我们采用两阶段实验方案：首先通过API进行PAIR式迭代提示优化，构建类别平衡的对话语料库；随后由训练有素的人工代理使用指定儿童账户在消费者界面中复现并优化这些提示，同时监控关联的家长收件箱中的警报通知。我们聚焦于七个风险领域——人身伤害、色情内容、隐私侵害、健康咨询、欺诈、仇恨言论和恶意软件，并量化四项指标：通知率、泄漏率、过度拦截率和界面干预率。通过自动化评估（辅以针对性人工审核）并对比当前后端系统与历史版本，研究发现通知机制具有选择性而非全面性：隐私侵害、欺诈、仇恨言论和恶意软件在实验过程中未触发任何家长警报，而人身伤害（触发率最高）、色情内容及部分健康咨询则产生间歇性警报。当前后端系统虽比历史模型具有更低的泄漏率，但对敏感话题附近良性教育类查询的过度拦截仍普遍存在，且未向家长透明呈现，这揭示了屏幕端防护措施与面向家长的遥测数据之间的政策-产品脱节。我们提出可操作的改进方案：扩展/配置通知分类体系，将可见防护措施与隐私保护的家长摘要相结合，并优先采用经过校准的适龄安全重写方案而非一概拒绝。