MGit: A Model Versioning and Management System

Models derived from other models are extremely common in machine learning (ML) today. For example, transfer learning is used to create task-specific models from "pre-trained" models through finetuning. This has led to an ecosystem where models are related to each other, sharing structure and often even parameter values. However, it is hard to manage these model derivatives: the storage overhead of storing all derived models quickly becomes onerous, prompting users to get rid of intermediate models that might be useful for further analysis. Additionally, undesired behaviors in models are hard to track down (e.g., is a bug inherited from an upstream model?). In this paper, we propose a model versioning and management system called MGit that makes it easier to store, test, update, and collaborate on model derivatives. MGit introduces a lineage graph that records provenance and versioning information between models, optimizations to efficiently store model parameters, as well as abstractions over this lineage graph that facilitate relevant testing, updating and collaboration functionality. MGit is able to reduce the lineage graph's storage footprint by up to 7x and automatically update downstream models in response to updates to upstream models.

翻译：在当今的机器学习（ML）中，从其他模型衍生而来的模型极为常见。例如，迁移学习通过微调从"预训练"模型创建特定任务的模型。这导致了模型之间相互关联的生态系统，它们在结构上共享，甚至常常共享参数值。然而，管理这些模型衍生体十分困难：存储所有衍生模型的开销很快就会变得繁重，促使用户丢弃那些可能对进一步分析有用的中间模型。此外，模型中的不良行为难以追踪（例如，缺陷是否源自上游模型？）。在本文中，我们提出一个名为MGit的模型版本管理与管理系统，旨在更便捷地存储、测试、更新和协作处理模型衍生体。MGit引入了一个记录模型间来源与版本信息的谱系图，并采用优化策略高效存储模型参数，同时在该谱系图上提供抽象层以支持相关的测试、更新和协作功能。MGit能够将谱系图的存储开销降低高达7倍，并在上游模型更新时自动更新下游模型。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/