Machine learning systems appear stochastic but are deterministically random, as seeded pseudorandom number generators produce identical realisations across repeated executions. Standard evaluation practice typically treats runs across alternatives as independent and does not exploit shared sources of randomness. This paper analyses the statistical structure of comparative evaluation under shared random seeds. Under this design, competing systems are evaluated using identical seeds, inducing matched stochastic realisations and yielding strict variance reduction whenever outcomes are positively correlated at the seed level. We demonstrate these effects using an extended learning-based multi-agent economic simulator, where paired evaluation exposes systematic differences in aggregate and distributional outcomes that remain statistically inconclusive under independent evaluation at fixed budgets.
翻译:机器学习系统看似具有随机性,但实际上是确定性随机,因为基于种子的伪随机数生成器在重复执行中会产生完全相同的实现。标准的评估实践通常将不同方案间的运行视为独立事件,并未利用共享的随机性来源。本文分析了在共享随机种子条件下比较性评估的统计结构。在此设计下,竞争系统使用相同的种子进行评估,从而诱导匹配的随机实现,并在种子层面结果呈正相关时严格降低方差。我们通过一个扩展的基于学习的多智能体经济仿真器验证了这些效应,其中配对评估揭示了在固定预算下独立评估中统计不显著的总体结果与分布结果的系统性差异。