Secure Multiparty Generative AI

As usage of generative AI tools skyrockets, the amount of sensitive information being exposed to these models and centralized model providers is alarming. For example, confidential source code from Samsung suffered a data leak as the text prompt to ChatGPT encountered data leakage. An increasing number of companies are restricting the use of LLMs (Apple, Verizon, JPMorgan Chase, etc.) due to data leakage or confidentiality issues. Also, an increasing number of centralized generative model providers are restricting, filtering, aligning, or censoring what can be used. Midjourney and RunwayML, two of the major image generation platforms, restrict the prompts to their system via prompt filtering. Certain political figures are restricted from image generation, as well as words associated with women's health care, rights, and abortion. In our research, we present a secure and private methodology for generative artificial intelligence that does not expose sensitive data or models to third-party AI providers. Our work modifies the key building block of modern generative AI algorithms, e.g. the transformer, and introduces confidential and verifiable multiparty computations in a decentralized network to maintain the 1) privacy of the user input and obfuscation to the output of the model, and 2) introduce privacy to the model itself. Additionally, the sharding process reduces the computational burden on any one node, enabling the distribution of resources of large generative AI processes across multiple, smaller nodes. We show that as long as there exists one honest node in the decentralized computation, security is maintained. We also show that the inference process will still succeed if only a majority of the nodes in the computation are successful. Thus, our method offers both secure and verifiable computation in a decentralized network.

翻译：随着生成式人工智能工具的使用量激增，敏感信息暴露给这些模型及集中式模型提供商的数量令人担忧。例如，三星公司的机密源代码因作为文本提示词输入ChatGPT而遭遇数据泄露。由于数据泄露或保密性问题，越来越多的公司（如苹果、威瑞森、摩根大通等）正在限制大型语言模型的使用。同时，越来越多的集中式生成模型提供商正在限制、过滤、对齐或审查可使用的输入内容。两大主要图像生成平台Midjourney和RunwayML通过提示词过滤机制限制系统可接受的提示词。某些政治人物的图像生成受到限制，与女性医疗保健、权利及堕胎相关的词汇亦被禁止。在本研究中，我们提出了一种安全且私密的生成式人工智能方法，该方法不会将敏感数据或模型暴露给第三方AI提供商。我们的工作改进了现代生成式人工智能算法的关键构建模块（例如Transformer），并在去中心化网络中引入了保密且可验证的多方计算，以实现：1）用户输入的隐私保护及模型输出的混淆处理；2）模型自身的隐私保护。此外，分片过程减轻了单个节点的计算负担，使得大型生成式AI过程的计算资源能够分布在多个较小节点上。我们证明，只要去中心化计算中存在一个诚实节点，安全性即可得到保障。我们还证明，只要计算节点中的多数成功执行，推理过程仍可完成。因此，我们的方法在去中心化网络中同时提供了安全且可验证的计算。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日