Intellectual Property Protection for Deep Learning Model and Dataset Intelligence

With the growing applications of Deep Learning (DL), especially recent spectacular achievements of Large Language Models (LLMs) such as ChatGPT and LLaMA, the commercial significance of these remarkable models has soared. However, acquiring well-trained models is costly and resource-intensive. It requires a considerable high-quality dataset, substantial investment in dedicated architecture design, expensive computational resources, and efforts to develop technical expertise. Consequently, safeguarding the Intellectual Property (IP) of well-trained models is attracting increasing attention. In contrast to existing surveys overwhelmingly focusing on model IPP mainly, this survey not only encompasses the protection on model level intelligence but also valuable dataset intelligence. Firstly, according to the requirements for effective IPP design, this work systematically summarizes the general and scheme-specific performance evaluation metrics. Secondly, from proactive IP infringement prevention and reactive IP ownership verification perspectives, it comprehensively investigates and analyzes the existing IPP methods for both dataset and model intelligence. Additionally, from the standpoint of training settings, it delves into the unique challenges that distributed settings pose to IPP compared to centralized settings. Furthermore, this work examines various attacks faced by deep IPP techniques. Finally, we outline prospects for promising future directions that may act as a guide for innovative research.

翻译：随着深度学习（DL）应用的日益广泛，特别是近期以ChatGPT和LLaMA为代表的大语言模型（LLMs）取得的瞩目成就，这些卓越模型的商业价值急剧攀升。然而，获取训练有素的模型成本高昂且资源密集，需要大量高质量数据集、在专用架构设计上的巨额投入、昂贵的计算资源以及培养专业技术知识的努力。因此，保护训练有素模型的知识产权（IP）正受到越来越多的关注。与现有综述大多主要聚焦于模型知识产权保护不同，本综述不仅涵盖模型层面智能的保护，还涉及有价值数据集智能的保护。首先，根据有效知识产权保护设计的要求，本文系统总结了通用及方案特定的性能评估指标。其次，从主动预防知识产权侵权与被动验证知识产权归属两个角度，全面调研并分析了针对数据集与模型智能的现有知识产权保护方法。此外，从训练设置的角度，本文深入探讨了相较于集中式设置，分布式设置给知识产权保护带来的独特挑战。进一步地，本文审视了深度知识产权保护技术所面临的各种攻击。最后，我们展望了有前景的未来研究方向，以期为创新性研究提供指引。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日