Privacy-Preserving Empathy Detection in Video Interactions

Detecting empathy from video interactions has emerging applications, yet raw videos that could be used for training AI models are rarely available due to privacy and ethical constraints. Public benchmarks are consequently released only as pre-extracted features, creating a privacy-constrained learning regime whose privacy-utility trade-off is poorly characterised. We formalise three levels of privacy for video-based behavioural prediction -- no privacy (raw video), partial privacy (temporal visual features such as facial landmarks, action units and eye gaze) and strong privacy (summary statistics of those features) -- and ask whether strong, subject-generalisable empathy detection is achievable at the strong-privacy level. We propose TFMPathy, instantiated with two recent Tabular Foundation Models (TFMs) (TabPFN v2 and TabICL), under both in-context learning and fine-tuning paradigms. On a public human-robot interaction benchmark, TFMPathy achieves strong utility under strong privacy, outperforming established baselines by a substantial margin. To assess robustness and facilitate fair, safe deployment, we introduce a cross-subject evaluation protocol that was previously lacking in this benchmark. Under this protocol, TFM fine-tuning improves generalisation capacity substantially (accuracy: $0.590 \rightarrow 0.730$; AUC: $0.564 \rightarrow 0.669$). Aggregating temporal features into summary statistics also suppresses subject-specific and demographic cues, aligning TFMPathy with data-minimisation principles. TFMPathy, therefore, offers a practical route to building AI systems that depend on human-centred video when governance, consent or institutional policies restrict the sharing of raw video. Code will be released upon acceptance at https://github.com/hasan-rakibul/TFMPathy.

翻译：从视频交互中检测共情具有新兴的应用前景，但由于隐私和伦理限制，可用于训练AI模型的原始视频资源极为稀缺。因此，公开基准数据集通常仅以预提取特征的形式发布，形成了隐私约束下的学习范式，但其隐私-效用权衡特性尚未得到充分刻画。我们将基于视频的行为预测划分为三个隐私等级——无隐私（原始视频）、部分隐私（时序视觉特征，如面部关键点、动作单元和注视方向）及强隐私（这些特征的汇总统计量），并探究在强隐私等级下能否实现具有主体泛化性的强共情检测。我们提出TFMPathy，该方法结合两种近期开发的表格基础模型（TabPFN v2与TabICL），在上下文学习与微调两种范式下进行实例化。在公开的人机交互基准数据集上，TFMPathy在强隐私条件下实现了强效用，以显著优势超越现有基准方法。为评估鲁棒性并促进公平安全的部署，我们引入了该基准数据集此前缺失的跨主体评估协议。在该协议下，表格基础模型微调显著提升了泛化能力（准确率：0.590→0.730；AUC：0.564→0.669）。将时序特征聚合为汇总统计量还抑制了主体特定和人口统计学线索，使TFMPathy符合数据最小化原则。因此，当治理规范、知情同意或机构政策限制原始视频共享时，TFMPathy为构建依赖以人为中心视频的AI系统提供了切实可行的路径。代码将在论文接收后发布于https://github.com/hasan-rakibul/TFMPathy。