Sample-and-rank is a key decoding strategy for modern generation-based dialogue systems. It helps achieve diverse and high-quality responses by selecting an answer from a small pool of generated candidates. The current state-of-the-art ranking methods mainly use an encoding paradigm called Cross-Encoder, which separately encodes each context-candidate pair and ranks the candidates according to their fitness scores. However, Cross-Encoder repeatedly encodes the same lengthy context for each candidate, resulting in high computational costs. Poly-Encoder addresses the above problems by reducing the interaction between context and candidates, but with a price of performance drop. In this work, we develop a new paradigm called Uni-Encoder, that keeps the full attention over each pair as in Cross-Encoder while only encoding the context once, as in Poly-Encoder. Uni-Encoder encodes all the candidates with the context in one forward pass. We use the same positional embedding for all candidates to ensure they are treated equally and design a new attention mechanism to avoid confusion. Our Uni-Encoder can simulate other ranking paradigms using different attention and response concatenation methods. Extensive experiments show that our proposed paradigm achieves new state-of-the-art results on four benchmark datasets with high computational efficiency. For instance, it improves R10@1 by 2.9% with an approximately 4X faster inference speed on the Ubuntu V2 dataset.
翻译:采样-排序是现代生成式对话系统的关键解码策略。通过从少量生成候选结果中选择答案,该方法有助于获得多样且高质量的响应。当前最先进的排序方法主要采用称为交叉编码器的编码范式,该范式独立编码每个上下文-候选对,并根据其适合度分数对候选结果进行排序。然而,交叉编码器会针对每个候选结果重复编码相同的长上下文,导致计算成本高昂。多编码器通过减少上下文与候选结果之间的交互来解决上述问题,但以性能下降为代价。本研究提出了一种名为统一编码器的新范式,该范式在保持交叉编码器对每个交互对完全注意力机制的同时,仅对上下文进行一次编码(类似于多编码器)。统一编码器在一次前向传播中完成所有候选结果与上下文的联合编码。我们为所有候选结果使用相同的位姿嵌入以确保其公平性,并设计了新的注意力机制以避免混淆。统一编码器可通过不同的注意力机制和响应拼接方法模拟其他排序范式。大量实验表明,该范式在四个基准数据集上以高计算效率实现了新的最先进结果。例如,在Ubuntu V2数据集上,其在推理速度提升约4倍的同时,将R10@1指标提升了2.9%。