Machine learning systems appear stochastic but are deterministically random, as seeded pseudorandom number generators produce identical realisations across repeated executions. Standard evaluation practice typically treats runs across alternatives as independent and does not exploit shared sources of randomness. This paper analyses the statistical structure of comparative evaluation under shared random seeds. Under this design, competing systems are evaluated using identical seeds, inducing matched stochastic realisations and yielding strict variance reduction whenever outcomes are positively correlated at the seed level. We demonstrate these effects using an extended learning-based multi-agent economic simulator, where paired evaluation exposes systematic differences in aggregate and distributional outcomes that remain statistically inconclusive under independent evaluation at fixed budgets.
翻译:机器学习系统看似随机,实则具有确定性随机特征,因为基于种子的伪随机数生成器在重复执行中会产生完全相同的实现。标准评估实践通常将不同方案间的运行视为独立事件,未能利用共享的随机性来源。本文分析了在共享随机种子条件下比较评估的统计结构。在此设计框架下,竞争系统使用相同种子进行评估,从而诱导匹配的随机实现,并在种子层面结果呈正相关时实现严格的方差缩减。我们通过扩展的基于学习的多智能体经济模拟器验证了这些效应,其中配对评估揭示了在固定预算下独立评估中统计不显著的总体结果与分布结果的系统性差异。