Bayes risk CTC: Controllable CTC alignment in Sequence-to-Sequence tasks

Sequence-to-Sequence (seq2seq) tasks transcribe the input sequence to a target sequence. The Connectionist Temporal Classification (CTC) criterion is widely used in multiple seq2seq tasks. Besides predicting the target sequence, a side product of CTC is to predict the alignment, which is the most probable input-long sequence that specifies a hard aligning relationship between the input and target units. As there are multiple potential aligning sequences (called paths) that are equally considered in CTC formulation, the choice of which path will be most probable and become the predicted alignment is always uncertain. In addition, it is usually observed that the alignment predicted by vanilla CTC will drift compared with its reference and rarely provides practical functionalities. Thus, the motivation of this work is to make the CTC alignment prediction controllable and thus equip CTC with extra functionalities. The Bayes risk CTC (BRCTC) criterion is then proposed in this work, in which a customizable Bayes risk function is adopted to enforce the desired characteristics of the predicted alignment. With the risk function, the BRCTC is a general framework to adopt some customizable preference over the paths in order to concentrate the posterior into a particular subset of the paths. In applications, we explore one particular preference which yields models with the down-sampling ability and reduced inference costs. By using BRCTC with another preference for early emissions, we obtain an improved performance-latency trade-off for online models. Experimentally, the proposed BRCTC reduces the inference cost of offline models by up to 47% without performance degradation and cuts down the overall latency of online systems to an unseen level.

翻译：序列到序列（seq2seq）任务将输入序列转录为目标序列。连接主义时间分类（CTC）准则广泛应用于多种seq2seq任务中。除了预测目标序列外，CTC的一个副产品是预测对齐关系，即最可能的最长输入序列，该序列指定了输入单元与目标单元之间的硬对齐关系。由于CTC公式中存在多个同等考虑的对齐序列（称为路径），哪条路径最可能成为预测对齐始终存在不确定性。此外，通常观察到原始CTC预测的对齐与参考对齐相比会发生偏移，且极少提供实用功能。因此，本工作的动机是实现可控的CTC对齐预测，从而为CTC赋予额外功能。本文提出贝叶斯风险CTC（BRCTC）准则，其中采用可定制的贝叶斯风险函数来强制实现预测对齐的期望特性。借助风险函数，BRCTC是一个通用框架，可对路径施加特定偏好，从而将后验分布集中在路径的特定子集上。在应用中，我们探索了一种特定偏好，该偏好赋予模型降采样能力并降低推理成本。通过使用具有另一种早发射偏好的BRCTC，我们实现了在线模型性能与延迟之间更优的权衡。实验表明，所提出的BRCTC在不损失性能的情况下，将离线模型的推理成本降低高达47%，并将在线系统的整体延迟降至前所未有的水平。