Autoregressive decoding strategy is a commonly used method for text generation tasks with pre-trained language models, while early-exiting is an effective approach to speedup the inference stage. In this work, we propose a novel decoding strategy named Hierarchical Skip Decoding (HSD) for efficient autoregressive text generation. Different from existing methods that require additional trainable components, HSD is a plug-and-play method applicable to autoregressive text generation models, it adaptively skips decoding layers in a hierarchical manner based on the current sequence length, thereby reducing computational workload and allocating computation resources. Comprehensive experiments on five text generation datasets with pre-trained language models demonstrate HSD's advantages in balancing efficiency and text quality. With almost half of the layers skipped, HSD can sustain 90% of the text quality compared to vanilla autoregressive decoding, outperforming the competitive approaches.
翻译:自回归解码策略是预训练语言模型在文本生成任务中的常用方法,而早期退出机制是加速推理阶段的有效手段。本研究提出一种名为分层跳跃解码(HSD)的新型解码策略,用于实现高效自回归文本生成。与现有需要额外可训练组件的方法不同,HSD是一种即插即用的方法,适用于自回归文本生成模型。它根据当前序列长度以分层方式自适应跳过解码层,从而减少计算负载并合理分配算力资源。在五个文本生成数据集上基于预训练语言模型的全面实验表明,HSD在平衡效率与文本质量方面具有显著优势。当跳过近半数解码层时,HSD仍能保持与标准自回归解码相当90%的文本质量,性能优于同类竞争方法。