In value-based deep reinforcement learning with replay memories, the batch size parameter specifies how many transitions to sample for each gradient update. Although critical to the learning process, this value is typically not adjusted when proposing new algorithms. In this work we present a broad empirical study that suggests {\em reducing} the batch size can result in a number of significant performance gains; this is surprising, as the general tendency when training neural networks is towards larger batch sizes for improved performance. We complement our experimental findings with a set of empirical analyses towards better understanding this phenomenon.
翻译:在基于经验回放的价值型深度强化学习中,批量大小参数指定每次梯度更新时采样的转换数量。尽管该参数对学习过程至关重要,但在提出新算法时通常不会对其进行调整。在本研究中,我们通过广泛的实证研究表明,减少批量大小可以带来多项显著的性能提升;这一结果令人惊讶,因为在训练神经网络时的一般趋势是采用更大的批量大小以提升性能。我们还通过一系列实证分析对这一现象进行了深入探讨,以补充我们的实验发现。