Although database systems perform well in data access and manipulation, their relational model hinders data scientists from formulating machine learning algorithms in SQL. Nevertheless, we argue that modern database systems perform well for machine learning algorithms expressed in relational algebra. To overcome the barrier of the relational model, this paper shows how to transform data into a relational representation for training neural networks in SQL: We first describe building blocks for data transformation, model training and inference in SQL-92 and their counterparts using an extended array data type. Then, we compare the implementation for model training and inference using array data types to the one using a relational representation in SQL-92 only. The evaluation in terms of runtime and memory consumption proves the suitability of modern database systems for matrix algebra, although specialised array data types perform better than matrices in relational representation.
翻译:尽管数据库系统在数据访问与操作方面表现卓越,但其关系模型阻碍了数据科学家在SQL中直接编写机器学习算法。然而,本文认为现代数据库系统对以关系代数表达的机器学习算法具有良好的支持性。为克服关系模型的障碍,本文展示了如何将数据转换为关系表示形式以在SQL中进行神经网络训练:首先,我们描述了基于SQL-92的数据转换、模型训练与推理的基本构建模块,以及使用扩展数组数据类型实现的对应模块。随后,我们比较了仅基于SQL-92的关系表示与使用数组数据类型实现的模型训练与推理方案。运行时性能与内存消耗的评估证明了现代数据库系统对矩阵代数的适用性,尽管专用数组数据类型的表现优于关系表示中的矩阵。