The purpose of predictive modeling on relational data is to predict future or missing values in a relational database, for example, future purchases of a user, risk of readmission of the patient, or the likelihood that a financial transaction is fraudulent. Typically powered by machine learning methods, predictive models are used in recommendations, financial fraud detection, supply chain optimization, and other systems, providing billions of predictions every day. However, training a machine learning model requires manual work to extract the required training examples - prediction entities and target labels - from the database, which is slow, laborious, and prone to mistakes. Here, we present the Predictive Query Language (PQL), a SQL-inspired declarative language for defining predictive tasks on relational databases. PQL allows specifying a predictive task in a single declarative query, enabling the automatic computation training labels for a large variety of machine learning tasks, such as regression, classification, time-series forecasting, and recommender systems. PQL is already successfully integrated and used in a collection of use cases as part of a predictive AI platform. The versatility of the language can be demonstrated through its many ongoing use cases, including financial fraud, item recommendations, and workload prediction. We demonstrate its versatile design through two implementations; one for small-scale, low-latency use and one that can handle large-scale databases.
翻译:关系数据预测建模旨在预测关系数据库中的未来值或缺失值,例如用户的未来购买行为、患者的再入院风险或金融交易的欺诈可能性。通常由机器学习方法驱动的预测模型被广泛应用于推荐系统、金融欺诈检测、供应链优化等领域,每日产生数十亿次预测。然而,训练机器学习模型需要从数据库中手动提取所需的训练样本——预测实体与目标标签——这一过程效率低下、耗时费力且容易出错。本文提出预测查询语言(PQL),这是一种受SQL启发的声明式语言,用于在关系数据库上定义预测任务。PQL允许通过单一声明式查询指定预测任务,能够为回归、分类、时间序列预测和推荐系统等多种机器学习任务自动计算训练标签。PQL已作为预测AI平台的组成部分,在多个应用场景中成功集成并投入使用。该语言的通用性可通过其众多实际应用案例得到验证,包括金融欺诈检测、商品推荐和负载预测等。我们通过两种实现方案展示其灵活的设计架构:一种适用于小规模低延迟场景,另一种可处理大规模数据库。