Agent-Based Simulation of Trust Development in Human-Robot Teams: An Empirically-Validated Framework

This paper presents an empirically grounded agent-based model capturing trust dynamics, workload distribution, and collaborative performance in human-robot teams. The model, implemented in NetLogo 6.4.0, simulates teams of 2--10 agents performing tasks of varying complexity. We validate against Hancock et al.'s (2021) meta-analysis, achieving interval validity for 4 of 8 trust antecedent categories and strong ordinal validity (Spearman \r{ho}=0.833ρ= 0.833 \r{ho}=0.833). Sensitivity analysis using OFAT and full factorial designs (n=50n = 50 n=50 replications per condition) reveals robot reliability exhibits the strongest effect on trust (η2=0.35η^2 = 0.35 η2=0.35) and dominates task success (η2=0.93η^2 = 0.93 η2=0.93) and productivity (η2=0.89η^2 = 0.89 η2=0.89), consistent with meta-analytic findings. Trust asymmetry ratios ranged from 0.07 to 0.55 -- below the meta-analytic benchmark of 1.50 -- revealing that per-event asymmetry does not guarantee cumulative asymmetry when trust repair mechanisms remain active. Scenario analysis uncovered trust-performance decoupling: the Trust Recovery scenario achieved the highest productivity (4.29) despite the lowest trust (38.2), while the Unreliable Robot scenario produced the highest trust (73.2) despite the lowest task success (33.4\%), establishing calibration error as a critical diagnostic distinct from trust magnitude. Factorial ANOVA confirmed significant main effects for reliability, transparency, communication, and collaboration (p<.001p < .001 p<.001), explaining 45.4\% of trust variance. The open-source implementation provides an evidence-based tool for identifying overtrust and undertrust conditions prior to deployment.

翻译：本文提出一个基于经验数据的智能体模型，用于捕捉人机团队中的信任动态、工作负荷分配与协作绩效。该模型在NetLogo 6.4.0中实现，模拟由2-10个智能体执行不同复杂度任务的团队。我们参照Hancock等人（2021）的元分析进行验证，在8个信任前因类别中实现了4个类别的区间效度，并表现出强序数效度（Spearman ρ=0.833）。采用单因素遍历法与全因子设计（每种条件n=50次重复）的敏感性分析表明，机器人可靠性对信任的影响最强（η²=0.35），且主导任务成功率（η²=0.93）与生产率（η²=0.89），这与元分析结论一致。信任不对称比介于0.07至0.55之间——低于元分析基准值1.50——表明当信任修复机制持续激活时，单次事件的不对称性并不能保证累积不对称性。情景分析揭示了信任与绩效的解耦现象：信任恢复情景在信任水平最低（38.2）时实现了最高生产率（4.29），而不可靠机器人情景在任务成功率最低（33.4%）时却产生最高信任（73.2），这确立了校准误差作为区别于信任程度的关键诊断指标。因子方差分析确认了可靠性、透明度、沟通与协作的显著主效应（p<.001），可解释45.4%的信任方差。该开源实现为部署前识别过度信任与信任不足状况提供了基于证据的工具。