The popularity of data science as a discipline and its importance in the emerging economy and industrial progress dictate that machine learning be democratized for the masses. This also means that the current practice of workforce training using machine learning tools, which requires low-level statistical and algorithmic details, is a barrier that needs to be addressed. Similar to data management languages such as SQL, machine learning needs to be practiced at a conceptual level to help make it a staple tool for general users. In particular, the technical sophistication demanded by existing machine learning frameworks is prohibitive for many scientists who are not computationally savvy or well versed in machine learning techniques. The learning curve to use the needed machine learning tools is also too high for them to take advantage of these powerful platforms to rapidly advance science. In this paper, we introduce a new declarative machine learning query language, called {\em MQL}, for naive users. We discuss its merit and possible ways of implementing it over a traditional relational database system. We discuss two materials science experiments implemented using MQL on a materials science workflow system called MatFlow.
翻译:数据科学作为一门学科的普及及其在新兴经济和工业进步中的重要性,决定了机器学习必须向大众普及。这也意味着当前使用机器学习工具进行劳动力培训的实践——这需要低层次的统计和算法细节——是一个需要解决的障碍。类似于SQL等数据管理语言,机器学习需要在概念层面进行实践,以帮助其成为普通用户的主流工具。特别是,现有机器学习框架所要求的技术复杂性,对于许多不精通计算或不熟悉机器学习技术的科学家来说是难以逾越的。使用所需机器学习工具的学习曲线也过高,使他们无法利用这些强大平台快速推进科学进展。在本文中,我们为新手用户介绍了一种新的声明式机器学习查询语言,称为{\em MQL}。我们讨论了它的优点以及在传统关系数据库系统上实现它的可能方式。我们还讨论了使用MQL在名为MatFlow的材料科学工作流系统上实现的两个材料科学实验。