The rapid growth of large-scale AI models, particularly large language models has brought significant challenges in data privacy, computational resources, and accessibility. Traditional centralized architectures often struggle to meet required data security and scalability needs which hinders the democratization of AI systems. Nesa introduces a model-agnostic sharding framework designed for decentralized AI inference. Our framework uses blockchain-based sequential deep neural network sharding to distribute computational tasks across a diverse network of nodes based on a personalised heuristic and routing mechanism. This enables efficient distributed training and inference for recent large-scale models even on consumer-grade hardware. We use compression techniques like dynamic blockwise quantization and mixed matrix decomposition to reduce data transfer and memory needs. We also integrate robust security measures, including hardware-based trusted execution environments to ensure data integrity and confidentiality. Evaluating our system across various natural language processing and vision tasks shows that these compression strategies do not compromise model accuracy. Our results highlight the potential to democratize access to cutting-edge AI technologies by enabling secure and efficient inference on a decentralized network.
翻译:随着大规模AI模型(尤其是大语言模型)的快速发展,数据隐私、计算资源和可访问性方面面临重大挑战。传统的集中式架构往往难以满足所需的数据安全性和可扩展性需求,这阻碍了AI系统的民主化进程。Nesa提出了一种专为去中心化AI推理设计的模型无关分片框架。该框架采用基于区块链的顺序深度神经网络分片技术,通过个性化启发式路由机制将计算任务分配到异构节点网络中。这使得即使在消费级硬件上也能对当前的大规模模型进行高效的分布式训练与推理。我们采用动态分块量化与混合矩阵分解等压缩技术来降低数据传输与内存需求。同时集成了包括基于硬件的可信执行环境在内的鲁棒安全措施,以确保数据完整性与机密性。通过在多种自然语言处理和视觉任务上评估本系统,结果表明这些压缩策略不会损害模型精度。我们的研究凸显了通过在去中心化网络上实现安全高效的推理,为前沿AI技术提供民主化访问的潜力。