Depictions of Depression in Generative AI Video Models: A Preliminary Study of OpenAI's Sora 2

Generative video models are increasingly capable of producing complex depictions of mental health experiences, yet little is known about how these systems represent conditions like depression. This study characterizes how OpenAI's Sora 2 generative video model depicts depression and examines whether depictions differ between the consumer App and developer API access points. We generated 100 videos using the single-word prompt "Depression" across two access points: the consumer App (n=50) and developer API (n=50). Two trained coders independently coded narrative structure, visual environments, objects, figure demographics, and figure states. Computational features across visual aesthetics, audio, semantic content, and temporal dynamics were extracted and compared between modalities. App-generated videos exhibited a pronounced recovery bias: 78% (39/50) featured narrative arcs progressing from depressive states toward resolution, compared with 14% (7/50) of API outputs. App videos brightened over time (slope = 2.90 brightness units/second vs. -0.18 for API; d = 1.59, q < .001) and contained three times more motion (d = 2.07, q < .001). Across both modalities, videos converged on a narrow visual vocabulary and featured recurring objects including hoodies (n=194), windows (n=148), and rain (n=83). Figures were predominantly young adults (88% aged 20-30) and nearly always alone (98%). Gender varied by access point: App outputs skewed male (68%), API outputs skewed female (59%). Sora 2 does not invent new visual grammars for depression but compresses and recombines cultural iconographies, while platform-level constraints substantially shape which narratives reach users. Clinicians should be aware that AI-generated mental health video content reflects training data and platform design rather than clinical knowledge, and that patients may encounter such content during vulnerable periods.

翻译：生成式视频模型日益能够呈现复杂的心理健康体验，然而这些系统如何表征抑郁等状况尚知之甚少。本研究旨在刻画OpenAI Sora 2生成式视频模型对抑郁的呈现方式，并探讨消费者App与开发者API两种访问终端之间的呈现差异。我们通过单关键词提示"抑郁"生成100段视频，分别来自消费者App（n=50）和开发者API（n=50）。两名经过培训的编码员独立对叙事结构、视觉环境、物体、人物人口统计学特征及人物状态进行编码。提取并比较了视觉美学、音频、语义内容及时序动态等计算特征。App生成视频表现出显著的恢复偏差：78%（39/50）的视频呈现从抑郁状态向解决方案发展的叙事弧，而API输出中该比例仅为14%（7/50）。App视频随时间推移亮度递增（斜率=2.90亮度单位/秒，API为-0.18；d=1.59，q<.001），且运动量是API的三倍（d=2.07，q<.001）。两种模式下，视频均局限于狭窄的视觉词汇，重复出现连帽衫（n=194）、窗户（n=148）和雨（n=83）等物体。人物主要为年轻人（88%为20-30岁），且几乎总是独处（98%）。性别因访问终端而异：App输出偏向男性（68%），API输出偏向女性（59%）。Sora 2并未发明新的抑郁视觉语法，而是压缩重组了文化意象，同时平台层面约束实质性决定了哪些叙事会传递给用户。临床医生应认识到，AI生成的心理健康视频内容反映的是训练数据与平台设计而非临床知识，患者可能在脆弱时期接触此类内容。