Pathological speech from patients with neurodegenerative and neuromotor disorders is often acoustically distorted and linguistically fragmented, making pathological speech reconstruction necessary to recover intended textual content from distorted and incomplete speech recordings. Crucially, such recordings are rarely uniformly degraded: some words or short phrases remain reliable and can serve as audible anchors for reconstructing the corrupted surrounding content. We introduce Anchor-gated Phonetic Group Relative Policy Optimization (AP-GRPO), a GRPO framework with phonetic reward that aligns speech language models (SLMs) through audible-anchor preservation and inter-anchor phonetic compatibility to the original speech signal. AP-GRPO consists of: (i) an anchor-gated reward that matches reliable audible anchors in clear regions; and (ii) an inter-anchor phonetic alignment reward that evaluates whether recovered contents are phonetically supported by the corresponding corrupted inter-anchor speech span. Across four disease conditions, AP-GRPO improves faithful speech reconstruction, and the learned anchor constraint automatically adapts to each condition and thus reveals interpretable disease-specific profiles: conditions with severe articulatory degradation require stronger anchor enforcement, whereas milder impairment or linguistically impaired conditions rely more on phonetic alignment for inter-anchor recovery.
翻译:神经退行性和神经运动障碍患者的病理性语音通常存在声学失真和语言碎片化问题,这使得从失真的不完整语音记录中恢复预期文本内容的病理性语音重建成为必要关键。值得注意的是,此类记录极少呈现均匀退化:部分单词或短句仍保持清晰度,可作为重构周围受损内容的可听锚点。我们提出锚点门控音素分组相对策略优化(AP-GRPO),这是一种结合音素奖励的GRPO框架,通过可听锚点保留和锚点间音素兼容性,将语音语言模型(SLM)与原始语音信号对齐。AP-GRPO包含:(i)锚点门控奖励,用于匹配清晰区域中的可靠可听锚点;(ii)锚点间音素对齐奖励,用于评估恢复内容是否在音素层面得到对应受损锚点间语音片段的支撑。在四种疾病条件下,AP-GRPO提升了语音重建保真度,且学习到的锚点约束能自适应各疾病条件,揭示出可解释的疾病特异性特征:严重发音退化条件需更强的锚点强制执行,而轻度损伤或语言障碍条件则更依赖锚点间恢复的音素对齐。