We present a new self-supervised machine learning approach for symbolic simplification of complex mathematical expressions. Training data is generated by scrambling simple expressions and recording the inverse operations, creating oracle trajectories that provide both goal states and explicit paths to reach them. A permutation-equivariant, transformer-based policy network is then trained on this data step-wise to predict the oracle action given the input expression. We demonstrate this approach on two problems in high-energy physics: dilogarithm reduction and spinor-helicity scattering amplitude simplification. In both cases, our trained policy network achieves near perfect solve rates across a wide range of difficulty levels, substantially outperforming prior approaches based on reinforcement learning and end-to-end regression. When combined with contrastive grouping and beam search, our model achieves a 100\% full simplification rate on a representative selection of 5-point gluon tree-level amplitudes in Yang-Mills theory, including expressions with over 200 initial terms.
翻译:我们提出了一种新的自监督机器学习方法,用于复杂数学表达式的符号简化。通过打乱简单表达式并记录逆运算来生成训练数据,从而创建同时提供目标状态及达到目标的显式路径的引导轨迹。随后,基于置换等变性的Transformer策略网络在此数据上逐步训练,以在给定输入表达式时预测引导动作。我们在高能物理中的两个问题——双对数约简和旋量-螺旋度散射振幅简化——上验证了该方法。在两种情况下,训练后的策略网络在广泛难度范围内实现了近乎完美的求解率,显著优于先前基于强化学习和端到端回归的方法。当与对比分组和束搜索结合时,我们的模型在杨-米尔斯理论中代表性的5点胶子树级振幅选择上实现了100%的完全简化率,包括包含超过200个初始项的表达式。