Fortress: A Case Study in Stabilizing Search Recommendations via Temporal Data Augmentation and Feature Pruning

In search and recommendation systems, predictive models often suffer from temporal instability when certain input features introduce volatility in output scores. This instability can degrade model reliability and user experience especially in multi-stage systems where consistent predictions are critical for downstream decision making. We introduce Fortress, a general framework for enhancing model stability and accuracy by identifying and pruning features that contribute to inconsistent prediction scores over time. Fortress leverages historical snapshots temporally partitioned datasets capturing score fluctuations for the same entity across periods and follows a four-step process: (1) collect historical snapshots, (2) identify samples with unstable predictions, (3) isolate and remove instability-inducing features, and (4) retrain models using only stable features. While semantic features from LLMs and BERT-based models improve generalization, they often lack full query or entity coverage. Engagement-based features offer strong predictive power but tend to introduce temporal instability. Fortress mitigates this trade-off by suppressing the volatility of engagement signals while retaining their predictive value leading to more stable and accurate models. We validate Fortress on a query-to-app relevance model in a large-scale app marketplace. Offline experiments demonstrate notable improvements in prediction stability (measured by Coefficient of Variation) and classification performance (measured by PR-AUC).

翻译：在搜索与推荐系统中，当某些输入特征引发输出分数的波动性时，预测模型常面临时序不稳定性问题。这种不稳定性会降低模型可靠性与用户体验，尤其在多级系统中——其中一致性预测对下游决策至关重要。我们提出Fortress，一个通过识别并剪除随时间导致预测分数不一致的特征，来增强模型稳定性与准确性的通用框架。Fortress利用历史快照（按时间划分的数据集，捕捉同一实体在不同时期的分数波动），遵循四步流程：（1）收集历史快照；（2）识别预测不稳定的样本；（3）分离并移除引发不稳定的特征；（4）仅使用稳定特征重新训练模型。尽管基于LLM与BERT的语义特征能提升泛化性，但常缺乏对完整查询或实体覆盖。基于交互的特征虽具强预测能力，却易引发时序不稳定性。Fortress通过抑制交互信号的波动性同时保留其预测价值，缓解这一权衡，从而构建更稳定、更精确的模型。我们在大规模应用市场中的查询-应用相关性模型上验证了Fortress。离线实验表明，其在预测稳定性（以变异系数衡量）与分类性能（以PR-AUC衡量）上均有显著提升。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

基于因果推断的推荐系统去偏研究

专知会员服务

21+阅读 · 2024年11月10日

【KDD2023】TransAct: 基于Transformer的实时用户行为模型在Pinterest的推荐系统中的应用

专知会员服务

26+阅读 · 2023年6月6日

【AAAI2023】统一序列更好:时间间隔感知数据增强的序列推荐

专知会员服务

16+阅读 · 2022年12月31日