Symbolic Computation algorithms and their implementation in computer algebra systems often contain choices which do not affect the correctness of the output but can significantly impact the resources required: such choices can benefit from having them made separately for each problem via a machine learning model. This study reports lessons on such use of machine learning in symbolic computation, in particular on the importance of analysing datasets prior to machine learning and on the different machine learning paradigms that may be utilised. We present results for a particular case study, the selection of variable ordering for cylindrical algebraic decomposition, but expect that the lessons learned are applicable to other decisions in symbolic computation. We utilise an existing dataset of examples derived from applications which was found to be imbalanced with respect to the variable ordering decision. We introduce an augmentation technique for polynomial systems problems that allows us to balance and further augment the dataset, improving the machine learning results by 28\% and 38\% on average, respectively. We then demonstrate how the existing machine learning methodology used for the problem $-$ classification $-$ might be recast into the regression paradigm. While this does not have a radical change on the performance, it does widen the scope in which the methodology can be applied to make choices.
翻译:符号计算算法及其在计算机代数系统中的实现常包含不影响输出正确性但显著影响资源消耗的选择:此类选择可通过机器学习模型针对每个问题单独决策而获益。本研究报告了将机器学习应用于符号计算的经验教训,尤其强调在机器学习前分析数据集的重要性以及可能采用的不同机器学习范式。我们以柱形代数分解中变量排序选择为具体案例展示结果,但预期所得经验可适用于符号计算中的其他决策。我们使用从应用问题中提取的现有数据集,发现其在变量排序决策上存在不平衡性。我们提出针对多项式系统的增广技术,在平衡数据集的同时进一步扩充数据,使机器学习结果平均分别提升28%和38%。随后我们证明,该问题现有机器学习方法(分类)可重构为回归范式。尽管性能未产生根本性改变,但这拓宽了该方法应用于决策选择的范围。