Models derived from other models are extremely common in machine learning (ML) today. For example, transfer learning is used to create task-specific models from "pre-trained" models through finetuning. This has led to an ecosystem where models are related to each other, sharing structure and often even parameter values. However, it is hard to manage these model derivatives: the storage overhead of storing all derived models quickly becomes onerous, prompting users to get rid of intermediate models that might be useful for further analysis. Additionally, undesired behaviors in models are hard to track down (e.g., is a bug inherited from an upstream model?). In this paper, we propose a model versioning and management system called MGit that makes it easier to store, test, update, and collaborate on model derivatives. MGit introduces a lineage graph that records provenance and versioning information between models, optimizations to efficiently store model parameters, as well as abstractions over this lineage graph that facilitate relevant testing, updating and collaboration functionality. MGit is able to reduce the lineage graph's storage footprint by up to 7x and automatically update downstream models in response to updates to upstream models.
翻译:在当今的机器学习(ML)中,从其他模型衍生而来的模型极为常见。例如,迁移学习通过微调从"预训练"模型创建特定任务的模型。这导致了模型之间相互关联的生态系统,它们在结构上共享,甚至常常共享参数值。然而,管理这些模型衍生体十分困难:存储所有衍生模型的开销很快就会变得繁重,促使用户丢弃那些可能对进一步分析有用的中间模型。此外,模型中的不良行为难以追踪(例如,缺陷是否源自上游模型?)。在本文中,我们提出一个名为MGit的模型版本管理与管理系统,旨在更便捷地存储、测试、更新和协作处理模型衍生体。MGit引入了一个记录模型间来源与版本信息的谱系图,并采用优化策略高效存储模型参数,同时在该谱系图上提供抽象层以支持相关的测试、更新和协作功能。MGit能够将谱系图的存储开销降低高达7倍,并在上游模型更新时自动更新下游模型。