Symbolic Computation algorithms and their implementation in computer algebra systems often contain choices which do not affect the correctness of the output but can significantly impact the resources required: such choices can benefit from having them made separately for each problem via a machine learning model. This study reports lessons on such use of machine learning in symbolic computation, in particular on the importance of analysing datasets prior to machine learning and on the different machine learning paradigms that may be utilised. We present results for a particular case study, the selection of variable ordering for cylindrical algebraic decomposition, but expect that the lessons learned are applicable to other decisions in symbolic computation. We utilise an existing dataset of examples derived from applications which was found to be imbalanced with respect to the variable ordering decision. We introduce an augmentation technique for polynomial systems problems that allows us to balance and further augment the dataset, improving the machine learning results by 28\% and 38\% on average, respectively. We then demonstrate how the existing machine learning methodology used for the problem $-$ classification $-$ might be recast into the regression paradigm. While this does not have a radical change on the performance, it does widen the scope in which the methodology can be applied to make choices.
翻译:符号计算算法及其在计算机代数系统中的实现常包含不影响输出正确性但显著影响资源消耗的选择:此类选择可通过机器学习模型针对每个问题单独决策而受益。本研究总结了机器学习在符号计算中应用的经验教训,重点分析了机器学习前数据集分析的重要性以及可运用的不同机器学习范式。我们以圆柱代数分解的变量序选择为具体案例展示结果,但预期所得经验适用于符号计算中的其他决策问题。我们使用了一个从实际应用衍生的现有数据集,发现该数据集在变量序决策方面存在不平衡性。针对多项式系统问题,我们引入了一种数据增强技术,能够平衡并进一步扩充数据集,使机器学习结果平均分别提升28%和38%。随后,我们演示了如何将现有用于该问题的机器学习方法——分类——重构为回归范式。虽然这并未带来性能的根本性改变,但拓宽了该方法在决策选择中的应用范围。