We present a new self-supervised machine learning approach for symbolic simplification of complex mathematical expressions. Training data is generated by scrambling simple expressions and recording the inverse operations, creating oracle trajectories that provide both goal states and explicit paths to reach them. A permutation-equivariant, transformer-based policy network is then trained on this data step-wise to predict the oracle action given the input expression. We demonstrate this approach on two problems in high-energy physics: dilogarithm reduction and spinor-helicity scattering amplitude simplification. In both cases, our trained policy network achieves near perfect solve rates across a wide range of difficulty levels, substantially outperforming prior approaches based on reinforcement learning and end-to-end regression. When combined with contrastive grouping and beam search, our model achieves a 100\% full simplification rate on a representative selection of 5-point gluon tree-level amplitudes in Yang-Mills theory, including expressions with over 200 initial terms.
翻译:我们提出了一种新的自监督机器学习方法,用于复杂数学表达式的符号简化。训练数据通过打乱简单表达式并记录逆操作生成,形成同时提供目标状态及其显式到达路径的预言轨迹。随后,基于排列等变性的变换器策略网络逐步训练,以根据输入表达式预测预言动作。我们在高能物理的两个问题——二对数约化与旋量-螺旋度散射振幅简化——上验证了该方法。在这两种情况下,训练后的策略网络在广泛的难度级别上实现了近乎完美的求解率,显著优于基于强化学习和端到端回归的先前方法。当与对比分组和束搜索相结合时,我们的模型在杨-米尔斯理论中5点胶子树级振幅的代表性选择(包括包含200余个初始项的表达式)上实现了100%的完整简化率。