Privacy-Preserving Machine Learning as a Service (PP-MLaaS) enables secure neural network inference by integrating cryptographic primitives such as homomorphic encryption (HE) and multi-party computation (MPC), protecting both client data and server models. Recent mixed-primitive frameworks have significantly improved inference efficiency, yet they process batched inputs sequentially, offering little flexibility for prioritizing urgent requests. Naïve queue jumping introduces considerable computational and communication overhead, increasing non-negligible latency for in-queue inputs. We initiate the study of privacy-preserving queue jumping in batched inference and propose PrivQJ, a novel framework that enables efficient priority handling without degrading overall system performance. PrivQJ exploits shared computation across inputs via in-processing slot recycling, allowing prior inputs to be piggybacked onto ongoing batch computation with almost no additional cryptographic cost. Both theoretical analysis and experimental results demonstrate over an order-of-magnitude reduction in overhead compared to state-of-the-art PP-MLaaS systems.
翻译:隐私保护机器学习即服务(PP-MLaaS)通过集成同态加密(HE)和安全多方计算(MPC)等密码学原语,实现了安全的神经网络推理,保护了客户端数据和服务器模型。近期的混合原语框架显著提升了推理效率,但它们按顺序处理批处理输入,对优先处理紧急请求的灵活性有限。简单的队列跳跃会引入可观的计算与通信开销,增加队列中输入的不可忽略延迟。我们首次研究了批处理推理中的隐私保护队列跳跃问题,并提出了PrivQJ这一新颖框架,它能在不降低整体系统性能的前提下实现高效的优先级处理。PrivQJ通过处理中槽位循环利用技术,跨输入共享计算,使得优先输入能够以近乎零额外密码学开销的方式搭载到正在进行的批处理计算中。理论分析与实验结果均表明,相较于最先进的PP-MLaaS系统,该框架将开销降低了一个数量级以上。