Given a set of deep learning models, it can be hard to find models appropriate to a task, understand the models, and characterize how models are different one from another. Currently, practitioners rely on manually-written documentation to understand and choose models. However, not all models have complete and reliable documentation. As the number of machine learning models increases, this issue of finding, differentiating, and understanding models is becoming more crucial. Inspired from research on data lakes, we introduce and define the concept of model lakes. We discuss fundamental research challenges in the management of large models. And we discuss what principled data management techniques can be brought to bear on the study of large model management.
翻译:给定一组深度学习模型,要找到适合特定任务的模型、理解这些模型并描述它们之间的差异可能十分困难。目前,实践者依赖手动编写的文档来理解和选择模型。然而,并非所有模型都具备完整且可靠的文档。随着机器学习模型数量的增加,查找、区分和理解模型这一问题的紧迫性日益凸显。受数据湖研究的启发,我们引入并定义了模型湖的概念。我们探讨了大规模模型管理中的基础研究挑战,并讨论了哪些原则性的数据管理技术可用于推进大规模模型管理的研究。