Contrary to the use of genetic programming, the neural network approach to symbolic regression can scale well with high input dimension and leverage gradient methods for faster equation searching. Common ways of constraining expression complexity have relied on multistage pruning methods with fine-tuning, but these often lead to significant performance loss. In this work, we propose SymbolNet, a neural network approach to symbolic regression in a novel framework that enables dynamic pruning of model weights, input features, and mathematical operators in a single training, where both training loss and expression complexity are optimized simultaneously. We introduce a sparsity regularization term per pruning type, which can adaptively adjust its own strength and lead to convergence to a target sparsity level. In contrast to most existing symbolic regression methods that cannot efficiently handle datasets with more than $O$(10) inputs, we demonstrate the effectiveness of our model on the LHC jet tagging task (16 inputs), MNIST (784 inputs), and SVHN (3072 inputs).
翻译:与遗传编程不同,神经网络方法在符号回归中能够很好地扩展至高维输入,并利用梯度方法加速方程搜索。常见的表达式复杂度约束依赖于多阶段剪枝与微调方法,但这往往会导致显著的性能损失。本文提出SymbolNet,一种基于新型框架的神经符号回归方法,能够在单次训练中动态剪枝模型权重、输入特征和数学算子,同时优化训练损失与表达式复杂度。我们针对每种剪枝类型引入稀疏性正则化项,该项可自适应调整自身强度,并收敛至目标稀疏度水平。与大多数现有符号回归方法无法高效处理超过$O$(10)个输入的数据集不同,我们在LHC喷注标记任务(16个输入)、MNIST(784个输入)和SVHN(3072个输入)上验证了模型的有效性。