Distributed machine learning enables parallel training of extensive datasets by delegating computing tasks across multiple workers. Despite the cost reduction benefits of distributed machine learning, the dissemination of final model weights often leads to potential conflicts over model ownership as workers struggle to substantiate their involvement in the training computation. To address the above ownership issues and prevent accidental failures and malicious attacks, verifying the computational integrity and effectiveness of workers becomes particularly crucial in distributed machine learning. In this paper, we proposed a novel binary linear tree commitment-based ownership protection model to ensure computational integrity with limited overhead and concise proof. Due to the frequent updates of parameters during training, our commitment scheme introduces a maintainable tree structure to reduce the costs of updating proofs. Distinguished from SNARK-based verifiable computation, our model achieves efficient proof aggregation by leveraging inner product arguments. Furthermore, proofs of model weights are watermarked by worker identity keys to prevent commitments from being forged or duplicated. The performance analysis and comparison with SNARK-based hash commitments validate the efficacy of our model in preserving computational integrity within distributed machine learning.
翻译:分布式机器学习通过将计算任务委托给多个工作节点,实现了大规模数据集的并行训练。尽管分布式机器学习具有降低成本的优点,但最终模型权重的分发常因工作节点难以证明其参与训练计算而导致模型所有权争议。为解决上述所有权问题并防止意外故障与恶意攻击,验证工作节点的计算完整性与有效性在分布式机器学习中显得尤为重要。本文提出一种基于二叉线性树承诺的新型所有权保护模型,以有限开销和简洁证明确保计算完整性。针对训练过程中参数频繁更新的特性,我们设计的承诺机制引入可维护的树结构以降低证明更新成本。与基于SNARK的可验证计算不同,本模型通过内积论证实现高效的证明聚合。此外,模型权重的证明通过工作节点身份密钥进行水印标记,以防承诺被伪造或复制。性能分析及与基于SNARK的哈希承诺对比验证了本模型在分布式机器学习中维护计算完整性的有效性。