Automatically detecting abnormal events in videos is crucial for modern autonomous systems, yet existing Video Anomaly Detection (VAD) benchmarks lack the scene diversity, balanced anomaly coverage, and temporal complexity needed to reliably assess real-world performance. Meanwhile, the community is increasingly moving toward Video Anomaly Understanding (VAU), which requires deeper semantic and causal reasoning but remains difficult to benchmark due to the heavy manual annotation effort it demands. In this paper, we introduce Pistachio, a new VAD/VAU benchmark constructed entirely through a controlled, generation-based pipeline. By leveraging recent advances in video generation models, Pistachio provides precise control over scenes, anomaly types, and temporal narratives, effectively eliminating the biases and limitations of Internet-collected datasets. Our pipeline integrates scene-conditioned anomaly assignment, multi-step storyline generation, and a temporally consistent long-form synthesis strategy that produces coherent 41-second videos with minimal human intervention. Extensive experiments demonstrate the scale, diversity, and complexity of Pistachio, revealing new challenges for existing methods and motivating future research on dynamic and multi-event anomaly understanding.
翻译:自动检测视频中的异常事件对现代自主系统至关重要,然而现有视频异常检测基准缺乏评估真实世界性能所需的场景多样性、均衡异常覆盖和时间复杂度。与此同时,该领域正逐步转向视频异常理解,这需要更深入的语义和因果推理能力,但由于其依赖大量人工标注而难以建立基准。本文提出Pistachio——一个完全通过受控生成流程构建的全新VAD/VAU基准。通过利用视频生成模型的最新进展,Pistachio实现了对场景、异常类型和时间叙事的精确控制,有效消除了互联网收集数据集中的偏差和局限。我们的流程整合了场景条件异常分配、多步骤故事线生成以及时间一致的长视频合成策略,能够以最少的人工干预生成连贯的41秒视频。大量实验证明了Pistachio的规模、多样性和复杂性,揭示了现有方法面临的新挑战,并推动了动态多事件异常理解的未来研究。