The Mersenne Twister MT19937 pseudorandom number generator, introduced by the last two authors in 1998, is still widely used. It passes all existing statistical tests, except for the linear complexity test, which measures the ratio of the even-odd of the number of 1's among specific bits (and hence should not be important for most applications). Harase reported that MT19937 is rejected by some birthday-spacing tests, which are rather artificially designed. In this paper, we report that MT19937 fails in a natural test based on the distribution of run-lengths on which we found an identical value in the output 32-bit integers. The number of observations of the run-length 623 is some 40 times larger than the expectation (and than the numbers of the observations of 622 and 624, etc.), which implies that the corresponding p-value is almost 0. We mathematically analyze the phenomena, and obtain a theorem which explains these failures. It seems not to be a serious defect of MT19937, because finding the defect requires astronomical efforts. Still, the phenomena should be reported to the academic society relating to pseudorandom number generation.
翻译:梅森旋转MT19937伪随机数生成器由最后两位作者于1998年提出,至今仍被广泛使用。该生成器通过了除线性复杂度测试外的所有现有统计检验(线性复杂度测试通过统计特定位中1的个数的奇偶性比率进行度量,因此对大多数应用并不重要)。Harase曾报道MT19937在某些人为设计的生日间隔检验中会被拒绝。本文中,我们报告MT19937在一项基于游程分布的自然检验中存在缺陷:我们在输出的32位整数序列中发现了完全相同的游程值。游程长度为623的观测次数约为期望值(以及游程长度为622、624等的观测次数)的40倍,这意味着对应的p值几乎为0。我们对此现象进行了数学分析,并获得了可解释这些失效现象的定理。由于发现该缺陷需要天文数字级的计算量,这似乎并非MT19937的严重缺陷。但该现象仍应向伪随机数生成领域的学术界进行报告。