Finding errors in machine learning applications requires a thorough exploration of their behavior over data. Existing approaches used by practitioners are often ad-hoc and lack the abstractions needed to scale this process. We present TorchQL, a programming framework to evaluate and improve the correctness of machine learning applications. TorchQL allows users to write queries to specify and check integrity constraints over machine learning models and datasets. It seamlessly integrates relational algebra with functional programming to allow for highly expressive queries using only eight intuitive operators. We evaluate TorchQL on diverse use-cases including finding critical temporal inconsistencies in objects detected across video frames in autonomous driving, finding data imputation errors in time-series medical records, finding data labeling errors in real-world images, and evaluating biases and constraining outputs of language models. Our experiments show that TorchQL enables up to 13x faster query executions than baselines like Pandas and MongoDB, and up to 40% shorter queries than native Python. We also conduct a user study and find that TorchQL is natural enough for developers familiar with Python to specify complex integrity constraints.
翻译:在机器学习应用中查找错误需要对其在数据上的行为进行深入探索。现有实践者采用的方法往往是临时性的,缺乏支撑这一流程扩展所需的抽象能力。我们提出TorchQL,一个用于评估和改进机器学习应用正确性的编程框架。TorchQL允许用户编写查询来指定并检查机器学习模型与数据集上的完整性约束。它无缝融合关系代数与函数式编程,仅通过八个直观算子即可实现高度表达力的查询。我们在多样化用例中对TorchQL进行了评估,包括发现自动驾驶场景中跨视频帧检测到的目标的时序不一致性、时间序列医疗记录中的数据插补错误、真实世界图像中的数据标注错误,以及评估语言模型的偏见并约束其输出。实验表明,相比Pandas和MongoDB等基线方法,TorchQL的查询执行速度最高提升13倍,查询代码长度相比原生Python缩短最高40%。我们还进行了一项用户研究,发现熟悉Python的开发者能够自然运用TorchQL来指定复杂的完整性约束。