In machine learning (ML), Python serves as a convenient abstraction for working with key libraries such as PyTorch, scikit-learn, and others. Unlike DBMS, however, Python applications may lose important data, such as trained models and extracted features, due to machine failures or human errors, leading to a waste of time and resources. Specifically, they lack four essential properties that could make ML more reliable and user-friendly -- durability, atomicity, replicability, and time-versioning (DART). This paper presents our vision of Transactional Python that provides DART without any code modifications to user programs or the Python kernel, by non-intrusively monitoring application states at the object level and determining a minimal amount of information sufficient to reconstruct a whole application. Our evaluation of a proof-of-concept implementation with public PyTorch and scikit-learn applications shows that DART can be offered with overheads ranging 1.5%--15.6%.
翻译:在机器学习(ML)中,Python作为处理PyTorch、scikit-learn等关键库的便捷抽象层被广泛使用。然而,与数据库管理系统(DBMS)不同,Python应用程序可能因机器故障或人为错误丢失重要数据,例如已训练的模型和提取的特征,从而导致时间和资源的浪费。具体而言,这些应用缺乏四项关键属性——持久性、原子性、可复现性和时间版本化(DART),这些属性本可使ML更可靠且更易使用。本文提出事务性Python的愿景,通过非侵入式地在对象层级监控应用状态,并确定足以重构整个应用的最小信息量,在不修改用户程序代码或Python内核的前提下,为应用赋予DART能力。我们基于公开的PyTorch和scikit-learn应用实现的概念验证评估表明,DART的开销范围为1.5%至15.6%。