Sample-and-rank is a key decoding strategy for modern generation-based dialogue systems. It helps achieve diverse and high-quality responses by selecting an answer from a small pool of generated candidates. The current state-of-the-art ranking methods mainly use an encoding paradigm called Cross-Encoder, which separately encodes each context-candidate pair and ranks the candidates according to their fitness scores. However, Cross-Encoder repeatedly encodes the same lengthy context for each candidate, resulting in high computational costs. Poly-Encoder addresses the above problems by reducing the interaction between context and candidates, but with a price of performance drop. In this work, we develop a new paradigm called Uni-Encoder, that keeps the full attention over each pair as in Cross-Encoder while only encoding the context once, as in Poly-Encoder. Uni-Encoder encodes all the candidates with the context in one forward pass. We use the same positional embedding for all candidates to ensure they are treated equally and design a new attention mechanism to avoid confusion. Our Uni-Encoder can simulate other ranking paradigms using different attention and response concatenation methods. Extensive experiments show that our proposed paradigm achieves new state-of-the-art results on four benchmark datasets with high computational efficiency. For instance, it improves R10@1 by 2.9% with an approximately 4X faster inference speed on the Ubuntu V2 dataset.
翻译:论文摘要:采样排序是现代生成式对话系统中的关键解码策略。该方法通过从少量生成的候选回复中筛选答案,实现多样化且高质量的响应。当前最先进的排序方法主要采用交叉编码器范式,该范式对每个上下文-候选对进行独立编码,并根据匹配分数对候选回复进行排序。然而交叉编码器会为每个候选回复重复编码相同长度的上下文,导致计算成本高昂。Poly-Encoder通过减少上下文与候选回复的交互解决了上述问题,但以性能下降为代价。本研究提出名为Uni-Encoder的新范式,该范式既能像交叉编码器一样保持对每对组合的全局注意力,又能像Poly-Encoder一样仅对上下文编码一次。Uni-Encoder通过单次前向传播同时编码所有候选回复与上下文。我们为所有候选回复采用相同的位置嵌入以确保其平等性,并设计新型注意力机制避免混淆。通过不同的注意力机制与响应拼接方法,我们的Uni-Encoder可模拟其他排序范式。大量实验表明,所提范式在四个基准数据集上以高计算效率取得最新最优结果。例如在Ubuntu V2数据集上,本方法在推理速度提升约4倍的同时,R10@1指标提升2.9%。