X4Val: Learning Neural Surrogates for Variance-Reduced Policy Evaluation

Rigorous evaluation of learning-based robotic systems is an essential prerequisite for deployment. However, real-world test data is expensive to gather; moreover, in a typical iterative development context, data gathered from the latest policy is necessarily limited in scale. This motivates evaluation methodologies that make use of heterogeneous data sources, including simulation, historical policy logs, and data collected from related platforms or environments. While such auxiliary data are abundant and inexpensive, they are generally not directly representative of real-world outcomes -- for example, performance in simulation may differ substantially from performance in the real world -- making their principled use for high-confidence performance estimation challenging. In this paper, we introduce X4Val, a general framework for variance-reduced real-world metric estimation in the presence of non-paired, multi-domain data. X4Val embeds samples from real and auxiliary domains into a shared representation space and learns a transferable predictor of real-world metrics; this learned predictor is then incorporated into a control-variates estimator, enabling variance reduction even when paired samples are unavailable. We provide theoretical analysis and empirical evaluations on autonomous driving and real-world robot manipulation tasks, domains across which X4Val achieves up to 38.4% variance reduction and demonstrates consistent improvements over strong baselines. These results show that non-paired, heterogeneous data can be leveraged to substantially improve the sample efficiency of rigorous robotic system validation.

翻译：对基于学习的机器人系统进行严格评估是部署的必要前提。然而，现实世界的测试数据收集成本高昂；此外，在典型的迭代开发过程中，从最新策略收集的数据规模必然有限。这促使人们采用利用异构数据源的评估方法，包括仿真、历史策略日志以及从相关平台或环境收集的数据。尽管此类辅助数据丰富且成本低廉，但它们通常无法直接代表现实世界的结果——例如，仿真性能可能与现实世界性能存在显著差异——这使得将这些数据原则性地用于高置信度性能估计具有挑战性。本文提出X4Val，一个在存在非配对多域数据时进行方差缩减的现实世界指标估计的通用框架。X4Val将来自现实域和辅助域的样本嵌入到共享表示空间中，并学习一个可迁移的现实世界指标预测器；然后将此学习到的预测器纳入控制变量估计器中，即使在缺乏配对样本的情况下也能实现方差缩减。我们提供了理论分析，并在自动驾驶和现实世界机器人操作任务上进行了实证评估，在这些领域中，X4Val实现了高达38.4%的方差缩减，并展现出优于强基线的持续改进。这些结果表明，非配对的异构数据可用于显著提高严格机器人系统验证的样本效率。