Judging whether an integer can be divided by prime numbers such as 2 or 3 may appear trivial to human beings, but can be less straightforward for computers. Here, we tested multiple deep learning architectures and feature engineering approaches on classifying integers based on their residues when divided by small prime numbers. We found that the ability of classification critically depends on the feature space. We also evaluated Automated Machine Learning (AutoML) platforms from Amazon, Google and Microsoft, and found that they failed on this task without appropriately engineered features. Furthermore, we introduced a method that utilizes linear regression on Fourier series basis vectors, and demonstrated its effectiveness. Finally, we evaluated Large Language Models (LLMs) such as GPT-4, GPT-J, LLaMA and Falcon, and demonstrated their failures. In conclusion, feature engineering remains an important task to improve performance and increase interpretability of machine-learning models, even in the era of AutoML and LLMs.
翻译:判断整数能否被2或3等素数整除对人类而言或许轻而易举,但对计算机来说却并非如此。本文测试了多种深度学习架构与特征工程方法,对整数按小素数除法的余数进行分类。研究发现,分类能力关键取决于特征空间。我们还评估了来自亚马逊、谷歌和微软的自动机器学习(AutoML)平台,结果表明在缺乏恰当特征工程的情况下,这些平台无法完成该任务。此外,我们提出了一种基于傅里叶级数基向量进行线性回归的方法,并验证了其有效性。最后,我们评估了GPT-4、GPT-J、LLaMA和Falcon等大语言模型(LLMs),揭示了它们的失败表现。结论表明,即使在AutoML和LLMs时代,特征工程仍是提升机器学习模型性能与可解释性的重要任务。