There are complaints about current machine learning techniques such as the requirement of a huge amount of training data and proficient training skills, the difficulty of continual learning, the risk of catastrophic forgetting, the leaking of data privacy/proprietary, etc. Most research efforts have been focusing on one of those concerned issues separately, paying less attention to the fact that most issues are entangled in practice. The prevailing big model paradigm, which has achieved impressive results in natural language processing and computer vision applications, has not yet addressed those issues, whereas becoming a serious source of carbon emissions. This article offers an overview of the learnware paradigm, which attempts to enable users not need to build machine learning models from scratch, with the hope of reusing small models to do things even beyond their original purposes, where the key ingredient is the specification which enables a trained model to be adequately identified to reuse according to the requirement of future users who know nothing about the model in advance.
翻译:当前机器学习技术存在诸多争议,如对海量训练数据和娴熟训练技巧的依赖、持续学习的困难、灾难性遗忘的风险、数据隐私/专有权泄露等问题。大多数研究分别聚焦于这些关切问题中的某一项,却较少关注实践中大多数问题相互交织的事实。主流的大模型范式虽然在自然语言处理和计算机视觉应用中取得了令人瞩目的成果,却未能解决上述问题,反而成为碳排放的重要来源。本文概述了学习件范式,该范式试图让用户无需从零构建机器学习模型,期望通过复用小模型实现超越其原始目的的功能,其核心要素是"规范说明书",它能根据对未来用户(事先对模型一无所知)需求的预判,使已训练模型能被充分识别并得以复用。