We describe AlignAtt4LLM, an IWSLT 2026 simultaneous speech translation system for English to German, Italian, and Chinese. The system is a synchronous cascade: Qwen3-ASR with forced alignment produces an incrementally updated source transcript, and Gemma-4 E4B-it translates that prefix under an MT-side AlignAtt policy. To our knowledge, this is the first application of AlignAtt to a decoder-only LLM, where the encoder-decoder cross-attention used by earlier AlignAtt systems is absent. We recover a usable policy by proposing (1) an explicit source span in the prompt, (2) offline selection of translation-specific alignment heads, (3) selective qk-fast replay of the draft-to-source attention block, and (4) runtime query/key capture that preserves model outputs bit-identically. On the IWSLT 2026 development set, AlignAtt4LLM outperforms the supplied baselines for the European target languages, English to German and English to Italian, in both the low-latency regime around 2 seconds and the high-latency regime below 4 seconds CU-LongYAAL. Results for English to Chinese are more mixed, but the method is not tied to Gemma-4: because AlignAtt4LLM only requires a deterministic prompt layout, calibrated attention heads, and query/key capture, the same policy can be reapplied to stronger translation-focused decoder-only MT backbones for non-European target languages.
翻译:我们介绍了AlignAtt4LLM,这是一个用于IWSLT 2026英语到德语、意大利语和汉语同声传译任务的系统。该系统采用同步级联架构:配备强制对齐功能的Qwen3-ASR模型增量式地生成源语言转录文本,而Gemma-4 E4B-it模型则在机器翻译端应用AlignAtt策略,对当前前缀进行翻译。据我们所知,这是首次将AlignAtt应用于仅有解码器的大语言模型,而此前AlignAtt系统所使用的编码器-解码器交叉注意力机制在此类模型中并不存在。我们通过提出以下方法恢复了可用的细化策略:(1)在提示中显式标注源语言跨度,(2)离线选择面向翻译的对齐注意力头,(3)对草稿到源语言注意力块进行选择性qk快速重放,以及(4)在运行时捕获查询/键以按位一致地保留模型输出。在IWSLT 2026开发集上,针对欧洲目标语言(即英语到德语和英语到意大利语),AlignAtt4LLM在约2秒的低延迟场景和低于4秒CU-LongYAAL的高延迟场景中均优于所提供的基线模型。英语到汉语的结果则较为混杂,但该方法并不局限于Gemma-4模型:由于AlignAtt4LLM仅需要确定性的提示布局、经过校准的注意力头以及查询/键捕获,相同的策略可重新应用于针对非欧洲目标语言的、更强且专注于翻译的仅有解码器机器翻译骨干网络。