There has been an increasing number of applications of machine learning to the field of Computer Algebra in recent years, including to the prominent sub-field of Symbolic Integration. However, machine learning models require an abundance of data for them to be successful and there exist few benchmarks on the scale required. While methods to generate new data already exist, they are flawed in several ways which may lead to bias in machine learning models trained upon them. In this paper, we describe how to use the Risch Algorithm for symbolic integration to create a dataset of elementary integrable expressions. Further, we show that data generated this way alleviates some of the flaws found in earlier methods.
翻译:近年来,机器学习在计算机代数领域中的应用日益增多,尤其在该领域的突出分支——符号积分中表现显著。然而,机器学习模型的成功需要大量数据,而目前满足其规模要求的基准数据集寥寥无几。尽管已有生成新数据的方法,但这些方法存在若干缺陷,可能导致基于这些数据训练的机器学习模型产生偏差。本文阐述如何利用符号积分的Risch算法构建初等可积表达式数据集。此外,我们证明该方法生成的数据能够缓解先前方法中的部分缺陷。