With the increasing popularity of recommendation systems (RecSys), the demand for compute resources in datacenters has surged. However, the model-wise resource allocation employed in current RecSys model serving architectures falls short in effectively utilizing resources, leading to sub-optimal total cost of ownership. We propose ElasticRec, a model serving architecture for RecSys providing resource elasticity and high memory efficiency. ElasticRec is based on a microservice-based software architecture for fine-grained resource allocation, tailored to the heterogeneous resource demands of RecSys. Additionally, ElasticRec achieves high memory efficiency via our utility-based resource allocation. Overall, ElasticRec achieves an average 3.3x reduction in memory allocation size and 8.1x increase in memory utility, resulting in an average 1.6x reduction in deployment cost compared to state-of-the-art RecSys inference serving system.
翻译:随着推荐系统日益普及,数据中心对计算资源的需求激增。然而,当前推荐系统模型服务架构采用的模型级资源分配策略难以有效利用资源,导致总体拥有成本未能达到最优。本文提出ElasticRec,一种为推荐系统设计的、具备资源弹性与高内存效率的模型服务架构。ElasticRec基于微服务软件架构实现细粒度资源分配,以适应推荐系统异构化的资源需求。此外,ElasticRec通过我们提出的基于效用的资源分配策略实现了高内存效率。总体而言,与当前最先进的推荐系统推理服务系统相比,ElasticRec平均减少了3.3倍的内存分配规模,提升了8.1倍的内存利用率,从而平均降低了1.6倍的部署成本。