ViSymRe: Vision Multimodal Symbolic Regression

Extracting interpretable equations from observational datasets to describe complex natural phenomena is one of the core goals of artificial intelligence. This field is known as symbolic regression (SR). In recent years, Transformer-based paradigms have become a new trend in SR, addressing the well-known problem of inefficient search. However, the modal heterogeneity between datasets and equations often hinders the convergence and generalization of these models. In this paper, we propose ViSymRe, a Vision Symbolic Regression framework, to explore the positive role of visual modality in enhancing the performance of Transformer-based SR paradigms. To overcome the challenge where the visual SR model is untrainable in high-dimensional scenarios, we present Multi-View Random Slicing (MVRS). By projecting multivariate equations into 2-D space using random affine transformations, MVRS avoids common defects in high-dimensional visualization, such as variable degradation, non-linear interaction missing, and exponentially increasing sampling complexity, enabling ViSymRe to be trained with low computational costs. To support dataset-only deployment of ViSymRe, we design a dual-vision pipeline architecture based on generative techniques, which reconstructs visual features directly from the datasets via an auxiliary Visual Decoder and automatically suppresses the attention weights of reconstruction noise through a proposed Biased Cross-Attention feature fusion module, ensuring that subsequent processes are not affected by noisy modalities. Ablation studies demonstrate the positive contribution of visual modality to improving model convergence level and enhancing various SR metrics. Furthermore, evaluation results on mainstream benchmarks indicate that ViSymRe achieves competitive performance compared to baselines, particularly in low-complexity and rapid-inference scenarios.

翻译：从观测数据集中提取可解释的方程以描述复杂自然现象，是人工智能的核心目标之一。该领域被称为符号回归（SR）。近年来，基于Transformer的范式已成为SR研究的新趋势，解决了搜索效率低下的经典难题。然而，数据集与方程之间的模态异质性常常阻碍这些模型的收敛与泛化能力。本文提出ViSymRe，一种视觉符号回归框架，旨在探索视觉模态在提升基于Transformer的SR范式性能中的积极作用。为克服视觉SR模型在高维场景下难以训练的问题，我们提出了多视角随机切片（MVRS）方法。通过随机仿射变换将多元方程投影至二维空间，MVRS避免了高维可视化中常见的变量退化、非线性交互缺失及采样复杂度指数级增长等缺陷，使得ViSymRe能够以较低计算成本进行训练。为支持ViSymRe的纯数据集部署，我们设计了一种基于生成技术的双视觉流水线架构：通过辅助视觉解码器直接从数据集重建视觉特征，并利用提出的偏置交叉注意力特征融合模块自动抑制重建噪声的注意力权重，确保后续处理不受噪声模态影响。消融实验证明了视觉模态对提升模型收敛水平及改善各项SR指标的积极贡献。此外，在主流基准测试上的评估结果表明，ViSymRe相较于基线方法取得了具有竞争力的性能，尤其在低复杂度与快速推理场景中表现突出。