Symbolic regression (SR) searches for analytical expressions representing the relationship between a set of explanatory and response variables. Current SR methods assume a single dataset extracted from a single experiment. Nevertheless, frequently, the researcher is confronted with multiple sets of results obtained from experiments conducted with different setups. Traditional SR methods may fail to find the underlying expression since the parameters of each experiment can be different. In this work we present Multi-View Symbolic Regression (MvSR), which takes into account multiple datasets simultaneously, mimicking experimental environments, and outputs a general parametric solution. This approach fits the evaluated expression to each independent dataset and returns a parametric family of functions f(x; \theta) simultaneously capable of accurately fitting all datasets. We demonstrate the effectiveness of MvSR using data generated from known expressions, as well as real-world data from astronomy, chemistry and economy, for which an a priori analytical expression is not available. Results show that MvSR obtains the correct expression more frequently and is robust to hyperparameters change. In real-world data, it is able to grasp the group behaviour, recovering known expressions from the literature as well as promising alternatives, thus enabling the use SR to a large range of experimental scenarios.
翻译:符号回归(SR)旨在搜索能够描述解释变量与响应变量之间关系的解析表达式。当前的SR方法假设数据源自单一实验的单个数据集。然而,研究者常常面临从不同实验设置中获取的多组结果。由于各组实验的参数可能不同,传统SR方法可能无法找到潜在的解析表达式。本文提出多视角符号回归(Multi-View Symbolic Regression, MvSR),该方法同时考虑多个数据集,模拟实验环境,并输出通用的参数化解。该方案将评估的表达式拟合至每个独立数据集,返回一个能够同时精确拟合所有数据集的参数函数族f(x; θ)。我们利用已知表达式生成的数据以及来自天文学、化学和经济学领域的真实数据(这些数据缺乏先验解析表达式)验证了MvSR的有效性。结果表明,MvSR能够更频繁地获得正确表达式,且对超参数变化具有鲁棒性。在真实数据中,该方法能够捕捉群体行为,恢复文献中已知的表达式并提出有前景的替代方案,从而将SR的应用扩展至更广泛的实验场景。