Human Factors in Detecting AI-Generated Portraits: Age, Sex, Device, and Confidence

Generative AI now produces photorealistic portraits that circulate widely in social and newslike contexts. Human ability to distinguish real from synthetic faces is time-sensitive because image generators continue to improve while public familiarity with synthetic media also changes. Here, we provide a time-stamped snapshot of human ability to distinguish real from AI-generated portraits produced by models available in July 2025. In a large-scale web experiment conducted from August 2025 to January 2026, 1,664 participants aged 20-69 years (mobile n = 1,330; PC n = 334) completed a two-alternative forced-choice task (REAL vs AI). Each participant judged 20 trials sampled from a 210-image pool comprising real FFHQ photographs and AI-generated portraits from ChatGPT-4o and Imagen 3. Overall accuracy was high (mean 85.2%, median 90%) but varied across groups. PC participants outperformed mobile participants by 3.65 percentage points. Accuracy declined with age in both device cohorts and more steeply on mobile than on PC (-0.607 vs -0.230 percentage points per year). Self-rated AI-detection confidence and AI exposure were positively associated with accuracy and statistically accounted for part of the age-related decline, with confidence accounting for the larger share. In the mobile cohort, an age-related sex divergence emerged among participants in their 50s and 60s, with female participants performing worse. Trial-level reaction-time models showed that correct AI judgments were faster than correct real judgments, whereas incorrect AI judgments were slower than incorrect real judgments. ChatGPT-4o portraits were harder and slower to classify than Imagen 3 portraits and were associated with a steeper age-related decline in performance. These findings frame AI portrait detection as a human-factors problem shaped by age, sex, device context, and confidence, not image realism alone.

翻译：生成式AI现已能产出在社交和新闻类语境中广泛传播的照片级真实肖像。人类区分真实人脸与合成人脸的能力具有时效性，因为图像生成器持续优化，同时公众对合成媒体的熟悉程度也在变化。本研究提供了2025年7月可用模型所生成AI肖像的区分能力时间戳快照。在2025年8月至2026年1月开展的大规模网络实验中，1664名20-69岁参与者（移动端n=1330；PC端n=334）完成了二选一强制选择任务（真实vs AI）。每位参与者从包含210张图片的素材库中随机抽取20个试次进行判断，素材库包含真实FFHQ照片及ChatGPT-4o与Imagen 3生成的AI肖像。总体准确率较高（均值85.2%，中位数90%），但组间差异显著。PC端参与者表现优于移动端3.65个百分点。两种设备组中准确率均随年龄增长而下降，移动端下降幅度大于PC端（每年-0.607 vs -0.230个百分点）。自评AI检测信心与AI接触频率均与准确率呈正相关，并在统计上解释了部分年龄相关衰退，其中信心的解释占比更大。在移动端组中，50-69岁参与者出现与年龄相关的性别分化现象，女性参与者表现更差。试次级反应时模型显示，正确AI判断快于正确真实判断，而错误AI判断慢于错误真实判断。ChatGPT-4o肖像比Imagen 3肖像更难分类且分类速度更慢，且与更陡峭的年龄相关性能衰退有关。这些发现表明AI肖像检测不仅是图像真实性问题，更是受年龄、性别、设备环境与信心共同塑造的人因问题。