Machine learning models are widely integrated into modern mobile apps to analyze user behaviors and deliver personalized services. Ensuring low-latency on-device model execution is critical for maintaining high-quality user experiences. While prior research has primarily focused on accelerating model inference with given input features, we identify an overlooked bottleneck in real-world on-device model execution pipelines: extracting input features from raw application logs. In this work, we explore a new direction of feature extraction optimization by analyzing and eliminating redundant extraction operations across different model features and consecutive model inferences. We then introduce AutoFeature, an automated feature extraction engine designed to accelerate on-device feature extraction process without compromising model inference accuracy. AutoFeature comprises three core designs: (1) graph abstraction to formulate the extraction workflows of different input features as one directed acyclic graph, (2) graph optimization to identify and fuse redundant operation nodes across different features within the graph; (3) efficient caching to minimize operations on overlapping raw data between consecutive model inferences. We implement a system prototype of AutoFeature and integrate it into five industrial mobile services spanning search, video and e-commerce domains. Online evaluations show that AutoFeature reduces end-to-end on-device model execution latency by 1.33x-3.93x during daytime and 1.43x-4.53x at night.
翻译:机器学习模型已广泛集成于现代移动应用中,用于分析用户行为并提供个性化服务。确保设备端模型执行的低延迟对维持高质量用户体验至关重要。尽管先前研究主要聚焦于在给定输入特征下加速模型推理,但我们识别出现实设备端模型执行流程中一个被忽视的瓶颈:从原始应用日志中提取输入特征。本研究通过分析并消除不同模型特征及连续模型推理中的冗余提取操作,探索了特征提取优化的新方向。我们提出AutoFeature——一种自动化特征提取引擎,旨在加速设备端特征提取过程而不影响模型推理精度。AutoFeature包含三项核心设计:(1)图抽象,将不同输入特征提取流程形式化为有向无环图;(2)图优化,识别并融合图中不同特征间的冗余操作节点;(3)高效缓存,最小化连续模型推理间重叠原始数据的操作。我们实现了AutoFeature系统原型,并将其集成至横跨搜索、视频与电商领域的五个工业级移动服务中。在线评估表明,AutoFeature在白天将端到端设备端模型执行延迟降低1.33倍至3.93倍,夜间降低1.43倍至4.53倍。