Large-scale ranking systems depend on thousands of features derived from user behavior across multiple time horizons. Typically requires model retraining -- resulting in long iteration cycles (3--6 months), substantial GPU resource consumption, and limited rollout throughput. We introduce Intelligent Elastic Feature Fading (IEFF), a production infrastructure system that enables retrain-free feature efficiency rollouts by elastically controlling feature coverage and distribution at serving time. IEFF supports incremental feature coverage adjustments while models adapt through recurring training, eliminating dependencies on explicit retraining cycles. The system incorporates strict safety guardrails, reversibility mechanisms, and comprehensive monitoring to ensure stability at scale. Across multiple production use cases, IEFF accelerates efficiency-related rollouts by 5$\times$, eliminates retraining-related GPU overhead, and enables faster capacity recycling. Extensive offline and online experiments demonstrate that gradual feature fading prevents 50--55\% of online performance degradation compared to abrupt feature removal, while maintaining stable model behavior. These results establish elastic, system-level feature fading as a practical and scalable approach for managing feature efficiency in modern industrial ranking systems.
翻译:大规模排序系统依赖源自用户跨多时间维度行为的数千种特征。传统方法需要模型重训练——这导致长达3至6个月的迭代周期、巨大的GPU资源消耗以及有限的部署吞吐量。我们提出智能弹性特征衰减(IEFF),这是一种生产级基础设施系统,通过在服务时弹性控制特征覆盖范围与分布,实现无需重训练的特征效率部署。IEFF支持增量式特征覆盖调整,同时模型通过循环训练进行自适应,消除对显式重训练周期的依赖。该系统集成了严格的安全护栏、可逆机制及全面监控,确保大规模稳定性。在多个生产用例中,IEFF将效率相关部署加速5倍,消除重训练相关的GPU开销,并实现更快的容量回收。大量离线和在线实验表明,与突发的特征移除相比,渐进式特征衰减可防止50%至55%的在线性能退化,同时保持模型行为稳定。这些结果证实,弹性系统级特征衰减是管理现代工业排序系统中特征效率的实用且可扩展的方法。