Automated emotion recognition in real-world workplace settings remains a challenging problem in affective computing due to the scarcity of large-scale, longitudinal datasets collected in naturalistic environments. We present a novel dataset comprising 733,651 facial expression records from 38 employees collected over 30.5 months (November 2021 to May 2024) in an authentic office environment. Each record contains seven emotion probabilities (neutral, happy, sad, surprised, fear, disgusted, angry) derived from deep learning-based facial expression recognition, along with comprehensive metadata including job roles, employment outcomes, and personality traits. The dataset uniquely spans the COVID-19 pandemic period, capturing emotional responses to major societal events including the Shanghai lockdown and policy changes. We provide 32 extended emotional metrics computed using established affective science methods, including valence, arousal, volatility, predictability, inertia, and emotional contagion strength. Technical validation demonstrates high data quality through successful replication of known psychological patterns (weekend effect: +192% valence improvement, p < 0.001; diurnal rhythm validated) and perfect predictive validity for employee turnover (AUC=1.0). Baseline experiments using Random Forest and LSTM models achieve 91.2% accuracy for emotion classification and R2 = 0.84 for valence prediction. This is the largest and longest longitudinal workplace emotion dataset publicly available, enabling research in emotion recognition, affective dynamics modeling, emotional contagion, turnover prediction, and emotion-aware system design.
翻译:在真实工作场所环境中实现自动化情绪识别仍然是情感计算领域的一个挑战性问题,这主要由于在自然环境中收集的大规模纵向数据集的稀缺性。我们提出了一个新颖的数据集,包含从38名员工在30.5个月期间(2021年11月至2024年5月)于真实办公室环境中收集的733,651条面部表情记录。每条记录包含基于深度学习的面部表情识别得出的七种情绪概率(中性、快乐、悲伤、惊讶、恐惧、厌恶、愤怒),以及包括工作角色、雇佣结果和人格特质在内的全面元数据。该数据集独特地跨越了COVID-19大流行时期,捕捉了人们对重大社会事件(包括上海封城和政策变化)的情绪反应。我们提供了使用成熟情感科学方法计算得出的32个扩展情绪指标,包括效价、唤醒度、波动性、可预测性、惯性和情绪传染强度。技术验证通过成功复现已知心理模式(周末效应:效价提升+192%,p < 0.001;昼夜节律得到验证)以及对员工离职率的完美预测效度(AUC=1.0),证明了数据的高质量。使用随机森林和LSTM模型的基线实验在情绪分类上达到了91.2%的准确率,在效价预测上达到了R2 = 0.84的拟合优度。这是目前公开可用的规模最大、时间跨度最长的工作场所情绪数据集,可用于情绪识别、情感动态建模、情绪传染、离职预测和情感感知系统设计等领域的研究。