The core challenge in numerous real-world applications is to match an inquiry to the best document from a mutable and finite set of candidates. Existing industry solutions, especially latency-constrained services, often rely on similarity algorithms that sacrifice quality for speed. In this paper we introduce a generic semantic learning-to-rank framework, Self-training Semantic Cross-attention Ranking (sRank). This transformer-based framework uses linear pairwise loss with mutable training batch sizes and achieves quality gains and high efficiency, and has been applied effectively to show gains on two industry tasks at Microsoft over real-world large-scale data sets: Smart Reply (SR) and Ambient Clinical Intelligence (ACI). In Smart Reply, sRank assists live customers with technical support by selecting the best reply from predefined solutions based on consumer and support agent messages. It achieves 11.7% gain in offline top-one accuracy on the SR task over the previous system, and has enabled 38.7% time reduction in composing messages in telemetry recorded since its general release in January 2021. In the ACI task, sRank selects relevant historical physician templates that serve as guidance for a text summarization model to generate higher quality medical notes. It achieves 35.5% top-one accuracy gain, along with 46% relative ROUGE-L gain in generated medical notes.
翻译:众多实际应用中的核心挑战在于将查询与一个可变且有限的候选文档集合中的最佳文档进行匹配。现有的工业解决方案,尤其是对延迟有严格约束的服务,通常依赖于以牺牲质量为代价换取速度的相似性算法。本文提出了一种通用的语义学习排序框架——自训练语义交叉注意力排序(sRank)。该基于Transformer的框架采用线性成对损失函数,并支持可变训练批次大小,在实现质量提升的同时保持了高效率,并已成功应用于微软的两项工业任务中,在真实世界的大规模数据集上取得了显著增益:智能回复(SR)和环境临床智能(ACI)。在智能回复任务中,sRank通过基于消费者与技术支持人员消息从预定义解决方案中选择最佳回复,为实时客户提供技术支持。相较于先前系统,该框架在SR任务上实现了离线top-one准确率11.7%的提升,且自2021年1月全面上线以来,根据遥测数据记录,其使消息撰写时间减少了38.7%。在ACI任务中,sRank通过筛选相关的历史医师模板作为文本摘要模型的指导,以生成更高质量的医疗记录。该框架实现了35.5%的top-one准确率提升,同时在生成的医疗记录中获得了46%的相对ROUGE-L增益。