Finding errors in machine learning applications requires a thorough exploration of their behavior over data. Existing approaches used by practitioners are often ad-hoc and lack the abstractions needed to scale this process. We present TorchQL, a programming framework to evaluate and improve the correctness of machine learning applications. TorchQL allows users to write queries to specify and check integrity constraints over machine learning models and datasets. It seamlessly integrates relational algebra with functional programming to allow for highly expressive queries using only eight intuitive operators. We evaluate TorchQL on diverse use-cases including finding critical temporal inconsistencies in objects detected across video frames in autonomous driving, finding data imputation errors in time-series medical records, finding data labeling errors in real-world images, and evaluating biases and constraining outputs of language models. Our experiments show that TorchQL enables up to 13x faster query executions than baselines like Pandas and MongoDB, and up to 40% shorter queries than native Python. We also conduct a user study and find that TorchQL is natural enough for developers familiar with Python to specify complex integrity constraints.
翻译:在机器学习应用中查找错误需要对其在数据上的行为进行彻底探索。从业者现有的方法通常是临时性的,缺乏扩展此过程所需的抽象。我们提出了TorchQL,一个用于评估和改进机器学习应用正确性的编程框架。TorchQL允许用户编写查询来指定和检查机器学习模型与数据集上的完整性约束。它无缝集成了关系代数与函数式编程,仅使用八个直观运算符即可实现高度表达性的查询。我们在多样化的用例中评估了TorchQL,包括发现自动驾驶视频帧间物体检测的关键时间不一致性、发现时间序列医疗记录中的数据插补错误、发现真实世界图像中的数据标注错误,以及评估语言模型的偏见并约束其输出。实验表明,相较于Pandas和MongoDB等基线方法,TorchQL可实现高达13倍的查询执行加速,且查询语句比原生Python代码缩短达40%。我们还进行了用户研究,发现TorchQL对于熟悉Python的开发者而言足够自然,能够用以指定复杂的完整性约束。