Socially intelligent AI systems must reason across diverse human behavioral tasks and generalize to new social contexts. However, behavioral data is inherently heterogeneous, comprising diverse modalities and prediction targets that produce uneven training signals across samples, creating imbalanced learning dynamics that challenge existing AI models. To address this, we develop Omnisapiens-7B 2.0, a foundation model for social behavior processing that explicitly addresses learning from heterogeneous behavioral data. This is enabled through Heterogeneity-Aware Relative Policy Optimization, a new RL method that rebalances learning signals across samples by approximating each sample's contribution to the policy update and using these estimates to drive geometrically centered, inertially smoothed advantage modulation for stable training. Omnisapiens-7B 2.0 achieves the best and most consistent performance across 10 behavioral tasks, while also attaining the best performance on all five held-out benchmarks, with gains of up to +12.02% and +9.37% respectively. Furthermore, it demonstrates more consistent and interpretable reasoning traces, supporting reliable real-world behavioral applications. Our model is available at https://github.com/MIT-MI/human_behavior_atlas.
翻译:具备社会智能的AI系统需能推理多样化的人类行为任务并泛化至新社会情境。然而,行为数据本质上具有异质性,包含多种模态与预测目标,导致不同样本间产生不均衡的训练信号,形成失衡的学习动态,对现有AI模型构成挑战。为解决这一问题,我们开发了OmniSapiens-7B 2.0——一个显式处理异质性行为数据学习的社会行为处理基础模型。该模型通过异质性感知相对策略优化实现,这是一种新型强化学习方法:通过近似每个样本对策略更新的贡献来重平衡不同样本间的学习信号,并利用这些估计值驱动几何中心化、惯性平滑的优势调制,从而实现稳定训练。OmniSapiens-7B 2.0在10项行为任务上取得最佳且最稳定的性能,同时在全部5个留出基准测试中均达到最优结果,分别取得最高+12.02%和+9.37%的性能增益。此外,该模型展现出更一致、更可解释的推理轨迹,支持可靠的真实世界行为应用。我们的模型开源地址:https://github.com/MIT-MI/human_behavior_atlas