Integer sequences in the OEIS span values from single-digit constants to astronomical factorials and exponentials, making prediction challenging for standard tokenised models that cannot handle out-of-vocabulary values or exploit periodic arithmetic structure. We present IntSeqBERT, a dual-stream Transformer encoder for masked integer-sequence modelling on OEIS. Each sequence element is encoded along two complementary axes: a continuous log-scale magnitude embedding and sin/cos modulo embeddings for 100 residues (moduli $2$--$101$), fused via FiLM. Three prediction heads (magnitude regression, sign classification, and modulo prediction for 100 moduli) are trained jointly on 274,705 OEIS sequences. At the Large scale (91.5M parameters), IntSeqBERT achieves 95.85% magnitude accuracy and 50.38% Mean Modulo Accuracy (MMA) on the test set, outperforming a standard tokenised Transformer baseline by $+8.9$ pt and $+4.5$ pt, respectively. An ablation removing the modulo stream confirms it accounts for $+15.2$ pt of the MMA gain and contributes an additional $+6.2$ pt to magnitude accuracy. A probabilistic Chinese Remainder Theorem (CRT)-based Solver converts the model's predictions into concrete integers, yielding a 7.4-fold improvement in next-term prediction over the tokenised-Transformer baseline (Top-1: 19.09% vs. 2.59%). Modulo spectrum analysis reveals a strong negative correlation between Normalised Information Gain (NIG) and Euler's totient ratio $\varphi(m)/m$ ($r = -0.851$, $p < 10^{-28}$), providing empirical evidence that composite moduli capture OEIS arithmetic structure more efficiently via CRT aggregation.
翻译:OEIS中的整数序列取值范围从个位数常数到天文级别的阶乘和指数,这使得标准分词模型难以处理未登录词或利用周期性算术结构进行预测。我们提出IntSeqBERT,一种用于OEIS掩码整数序列建模的双流Transformer编码器。每个序列元素沿两条互补轴进行编码:连续对数尺度幅度嵌入和100个残基(模数$2$--$101$)的正弦/余弦模数嵌入,通过FiLM融合。三个预测头(幅度回归、符号分类及100个模数的模数预测)在274,705条OEIS序列上联合训练。在大型规模(91.5M参数)下,IntSeqBERT在测试集上达到95.85%的幅度准确率和50.38%的平均模数准确率(MMA),分别比标准分词Transformer基线高出$+8.9$ pt和$+4.5$ pt。消融实验去除模数流后证实其对MMA增益贡献$+15.2$ pt,并对幅度准确率额外贡献$+6.2$ pt。基于概率中国剩余定理(CRT)的求解器将模型预测转化为具体整数,使得下一项预测相比分词Transformer基线提升7.4倍(Top-1:19.09% vs. 2.59%)。模数光谱分析揭示归一化信息增益(NIG)与欧拉函数比值$\varphi(m)/m$之间存在强负相关($r = -0.851$,$p < 10^{-28}$),为复合模数通过CRT聚合更高效捕获OEIS算术结构提供了实证证据。