With the growing applications of Deep Learning (DL), especially recent spectacular achievements of Large Language Models (LLMs) such as ChatGPT and LLaMA, the commercial significance of these remarkable models has soared. However, acquiring well-trained models is costly and resource-intensive. It requires a considerable high-quality dataset, substantial investment in dedicated architecture design, expensive computational resources, and efforts to develop technical expertise. Consequently, safeguarding the Intellectual Property (IP) of well-trained models is attracting increasing attention. In contrast to existing surveys overwhelmingly focusing on model IPP mainly, this survey not only encompasses the protection on model level intelligence but also valuable dataset intelligence. Firstly, according to the requirements for effective IPP design, this work systematically summarizes the general and scheme-specific performance evaluation metrics. Secondly, from proactive IP infringement prevention and reactive IP ownership verification perspectives, it comprehensively investigates and analyzes the existing IPP methods for both dataset and model intelligence. Additionally, from the standpoint of training settings, it delves into the unique challenges that distributed settings pose to IPP compared to centralized settings. Furthermore, this work examines various attacks faced by deep IPP techniques. Finally, we outline prospects for promising future directions that may act as a guide for innovative research.
翻译:随着深度学习(DL)应用的日益广泛,特别是近期以ChatGPT和LLaMA为代表的大语言模型(LLMs)取得的瞩目成就,这些卓越模型的商业价值急剧攀升。然而,获取训练有素的模型成本高昂且资源密集,需要大量高质量数据集、在专用架构设计上的巨额投入、昂贵的计算资源以及培养专业技术知识的努力。因此,保护训练有素模型的知识产权(IP)正受到越来越多的关注。与现有综述大多主要聚焦于模型知识产权保护不同,本综述不仅涵盖模型层面智能的保护,还涉及有价值数据集智能的保护。首先,根据有效知识产权保护设计的要求,本文系统总结了通用及方案特定的性能评估指标。其次,从主动预防知识产权侵权与被动验证知识产权归属两个角度,全面调研并分析了针对数据集与模型智能的现有知识产权保护方法。此外,从训练设置的角度,本文深入探讨了相较于集中式设置,分布式设置给知识产权保护带来的独特挑战。进一步地,本文审视了深度知识产权保护技术所面临的各种攻击。最后,我们展望了有前景的未来研究方向,以期为创新性研究提供指引。