As usage of generative AI tools skyrockets, the amount of sensitive information being exposed to these models and centralized model providers is alarming. For example, confidential source code from Samsung suffered a data leak as the text prompt to ChatGPT encountered data leakage. An increasing number of companies are restricting the use of LLMs (Apple, Verizon, JPMorgan Chase, etc.) due to data leakage or confidentiality issues. Also, an increasing number of centralized generative model providers are restricting, filtering, aligning, or censoring what can be used. Midjourney and RunwayML, two of the major image generation platforms, restrict the prompts to their system via prompt filtering. Certain political figures are restricted from image generation, as well as words associated with women's health care, rights, and abortion. In our research, we present a secure and private methodology for generative artificial intelligence that does not expose sensitive data or models to third-party AI providers. Our work modifies the key building block of modern generative AI algorithms, e.g. the transformer, and introduces confidential and verifiable multiparty computations in a decentralized network to maintain the 1) privacy of the user input and obfuscation to the output of the model, and 2) introduce privacy to the model itself. Additionally, the sharding process reduces the computational burden on any one node, enabling the distribution of resources of large generative AI processes across multiple, smaller nodes. We show that as long as there exists one honest node in the decentralized computation, security is maintained. We also show that the inference process will still succeed if only a majority of the nodes in the computation are successful. Thus, our method offers both secure and verifiable computation in a decentralized network.
翻译:随着生成式人工智能工具的使用量激增,敏感信息暴露给这些模型及集中式模型提供商的数量令人担忧。例如,三星公司的机密源代码因作为文本提示词输入ChatGPT而遭遇数据泄露。由于数据泄露或保密性问题,越来越多的公司(如苹果、威瑞森、摩根大通等)正在限制大型语言模型的使用。同时,越来越多的集中式生成模型提供商正在限制、过滤、对齐或审查可使用的输入内容。两大主要图像生成平台Midjourney和RunwayML通过提示词过滤机制限制系统可接受的提示词。某些政治人物的图像生成受到限制,与女性医疗保健、权利及堕胎相关的词汇亦被禁止。在本研究中,我们提出了一种安全且私密的生成式人工智能方法,该方法不会将敏感数据或模型暴露给第三方AI提供商。我们的工作改进了现代生成式人工智能算法的关键构建模块(例如Transformer),并在去中心化网络中引入了保密且可验证的多方计算,以实现:1)用户输入的隐私保护及模型输出的混淆处理;2)模型自身的隐私保护。此外,分片过程减轻了单个节点的计算负担,使得大型生成式AI过程的计算资源能够分布在多个较小节点上。我们证明,只要去中心化计算中存在一个诚实节点,安全性即可得到保障。我们还证明,只要计算节点中的多数成功执行,推理过程仍可完成。因此,我们的方法在去中心化网络中同时提供了安全且可验证的计算。