Model inference systems are essential for implementing end-to-end data analytics pipelines that deliver the benefits of machine learning models to users. Existing cloud-based model inference systems are costly, not easy to scale, and must be trusted in handling the models and user request data. Serverless computing presents a new opportunity, as it provides elasticity and fine-grained pricing. Our goal is to design a serverless model inference system that protects models and user request data from untrusted cloud providers. It offers high performance and low cost, while requiring no intrusive changes to the current serverless platforms. To realize our goal, we leverage trusted hardware. We identify and address three challenges in using trusted hardware for serverless model inference. These challenges arise from the high-level abstraction of serverless computing, the performance overhead of trusted hardware, and the characteristics of model inference workloads. We present SeSeMI, a secure, efficient, and cost-effective serverless model inference system. It adds three novel features non-intrusively to the existing serverless infrastructure and nothing else.The first feature is a key service that establishes secure channels between the user and the serverless instances, which also provides access control to models and users' data. The second is an enclave runtime that allows one enclave to process multiple concurrent requests. The final feature is a model packer that allows multiple models to be executed by one serverless instance. We build SeSeMI on top of Apache OpenWhisk, and conduct extensive experiments with three popular machine learning models. The results show that SeSeMI achieves low latency and low cost at scale for realistic workloads.
翻译:模型推理系统对于实现端到端数据分析管道至关重要,能够将机器学习模型的优势传递给用户。现有的基于云的模型推理系统成本高昂、不易扩展,且在处理模型和用户请求数据时必须被信任。无服务器计算因其提供的弹性与细粒度计费特性,为此带来了新的机遇。我们的目标是设计一种无服务器模型推理系统,能够保护模型和用户请求数据免受不可信云提供商的侵害。该系统需具备高性能与低成本,且无需对现有无服务器平台进行侵入式修改。为实现这一目标,我们利用了可信硬件。我们识别并解决了在无服务器模型推理中使用可信硬件所面临的三个挑战。这些挑战源于无服务器计算的高层抽象、可信硬件的性能开销以及模型推理工作负载的特性。本文提出了SeSeMI,一个安全、高效且经济实用的无服务器模型推理系统。它在现有无服务器基础设施中非侵入式地添加了三个新颖特性,而无需其他改动。第一个特性是一个密钥服务,用于在用户与无服务器实例之间建立安全通道,同时提供对模型和用户数据的访问控制。第二个特性是一个飞地运行时,允许单个飞地处理多个并发请求。最后一个特性是一个模型打包器,允许多个模型在单个无服务器实例上执行。我们在Apache OpenWhisk之上构建了SeSeMI,并使用三种流行的机器学习模型进行了大量实验。结果表明,对于实际工作负载,SeSeMI能够在大规模下实现低延迟与低成本。