The learnware paradigm proposed by Zhou [2016] aims to enable users to reuse numerous existing well-trained models instead of building machine learning models from scratch, with the hope of solving new user tasks even beyond models' original purposes. In this paradigm, developers worldwide can submit their high-performing models spontaneously to the learnware dock system (formerly known as learnware market) without revealing their training data. Once the dock system accepts the model, it assigns a specification and accommodates the model. This specification allows the model to be adequately identified and assembled to reuse according to future users' needs, even if they have no prior knowledge of the model. This paradigm greatly differs from the current big model direction and it is expected that a learnware dock system housing millions or more high-performing models could offer excellent capabilities for both planned tasks where big models are applicable; and unplanned, specialized, data-sensitive scenarios where big models are not present or applicable. This paper describes Beimingwu, the first open-source learnware dock system providing foundational support for future research of learnware paradigm.The system significantly streamlines the model development for new user tasks, thanks to its integrated architecture and engine design, extensive engineering implementations and optimizations, and the integration of various algorithms for learnware identification and reuse. Notably, this is possible even for users with limited data and minimal expertise in machine learning, without compromising the raw data's security. Beimingwu supports the entire process of learnware paradigm. The system lays the foundation for future research in learnware-related algorithms and systems, and prepares the ground for hosting a vast array of learnwares and establishing a learnware ecosystem.
翻译:周[2016]提出的学件范式旨在帮助用户复用已有的海量训练好的模型,而非从零构建机器学习模型,力求解决超越模型原始用途的新用户任务。在该范式中,全球开发者可自发向学件仓库系统(原称学件市场)提交其高性能模型,而无需泄露训练数据。仓库系统接收模型后,会为其分配规范并收纳该模型。该规范使模型能够被充分识别和组装,以便根据未来用户的需求进行复用,即使这些用户对模型一无所知。这一范式与当前的大模型方向截然不同,预期拥有百万级乃至更多高性能模型的学件仓库系统,既能在大模型适用场景中有序执行规划任务,也能应对大模型不适用或不可用的非规划、专业化及数据敏感场景。本文介绍了北冥武——首个开源学件仓库系统,为学件范式未来的研究提供了基础支撑。该系统通过集成的架构引擎设计、广泛的工程实现与优化,以及多种学件识别与复用算法的整合,显著简化了新用户任务的模型开发流程。尤其值得注意的是,即使对于数据有限且机器学习经验不足的用户,该系统也能在不牺牲原始数据安全性的前提下实现这一目标。北冥武支持学件范式的全流程,为未来学件相关算法与系统的研究奠定基础,并为容纳海量学件并构建学件生态做好准备。