The purpose of predictive modeling on relational data is to predict future or missing values in a relational database, for example, future purchases of a user, risk of readmission of the patient, or the likelihood that a financial transaction is fraudulent. Typically powered by machine learning methods, predictive models are used in recommendations, financial fraud detection, supply chain optimization, and other systems, providing billions of predictions every day. However, training a machine learning model requires manual work to extract the required training examples - prediction entities and target labels - from the database, which is slow, laborious, and prone to mistakes. Here, we present the Predictive Query Language (PQL), an SQL-inspired declarative language for defining predictive tasks on relational databases. PQL allows specifying a predictive task in a single declarative query, enabling the automatic computation of training labels for a large variety of machine learning tasks, such as regression, classification, time-series forecasting, and recommender systems. PQL is already successfully integrated and used in a collection of use cases as part of a predictive AI platform. The versatility of the language can be demonstrated through its many ongoing use cases, including financial fraud, item recommendations, and workload prediction. We demonstrate its versatile design through two implementations; one for small-scale, low-latency use and one that can handle large-scale databases.
翻译:关系数据预测建模的目的是预测关系数据库中未来或缺失的值,例如用户的未来购买行为、患者的再入院风险或金融交易存在欺诈的可能性。预测模型通常由机器学习方法驱动,被应用于推荐系统、金融欺诈检测、供应链优化及其他系统中,每日提供数十亿次预测。然而,训练机器学习模型需要从数据库中手动提取所需的训练样本——预测实体与目标标签——这一过程缓慢、费力且易出错。本文提出预测查询语言(PQL),这是一种受SQL启发的声明式语言,用于在关系数据库上定义预测任务。PQL允许通过单个声明式查询来指定预测任务,从而能够为多种机器学习任务(如回归、分类、时间序列预测和推荐系统)自动计算训练标签。PQL已作为一个预测AI平台的组成部分,成功集成并应用于一系列实际用例中。该语言的通用性可通过其众多持续进行的用例得到证明,包括金融欺诈、物品推荐和工作负载预测等。我们通过两种实现方案展示了其灵活的设计:一种适用于小规模、低延迟场景,另一种则能够处理大规模数据库。