Symbolic Regression (SR) is a powerful technique for automatically discovering mathematical expressions from input data. Mainstream SR algorithms search for the optimal symbolic tree in a vast function space, but the increasing complexity of the tree structure limits their performance. Inspired by neural networks, symbolic networks have emerged as a promising new paradigm. However, most existing symbolic networks still face certain challenges: binary nonlinear operators $\{\times, ÷\}$ cannot be naturally extended to multivariate operators, and training with fixed architecture often leads to higher complexity and overfitting. In this work, we propose a Unified Symbolic Network that unifies nonlinear binary operators into nested unary operators and define the conditions under which UniSymNet can reduce complexity. Moreover, we pre-train a Transformer model with a novel label encoding method to guide structural selection, and adopt objective-specific optimization strategies to learn the parameters of the symbolic network. UniSymNet shows high fitting accuracy, excellent symbolic solution rate, and relatively low expression complexity, achieving competitive performance on low-dimensional Standard Benchmarks and high-dimensional SRBench.
翻译:符号回归(SR)是一种从输入数据中自动发现数学表达式的强大技术。主流的SR算法在庞大的函数空间中搜索最优符号树,但树结构日益增长的复杂性限制了其性能。受神经网络启发,符号网络已成为一种有前景的新范式。然而,现有的大多数符号网络仍面临某些挑战:二元非线性运算符 $\{\times, ÷\}$ 无法自然地推广到多元运算符,且采用固定架构进行训练通常会导致更高的复杂性和过拟合。本文提出了一种统一符号网络,它将二元非线性运算符统一为嵌套的一元运算符,并定义了UniSymNet能够降低复杂性的条件。此外,我们使用一种新颖的标签编码方法预训练了一个Transformer模型来引导结构选择,并采用面向特定目标的优化策略来学习符号网络的参数。UniSymNet展现出高拟合精度、优异的符号求解率以及相对较低的表达式复杂度,在低维标准基准测试和高维SRBench上均取得了具有竞争力的性能。