Latency-Configurable Streaming Speech Enhancement via Asymmetric Temporal Padding - 专知论文

会员服务 ·

0

流 · 语音增强 · MS · Buffer（公司） · binary ·

Latency-Configurable Streaming Speech Enhancement via Asymmetric Temporal Padding

翻译：暂无翻译

Yunsik Kim,Yoonyoung Chung

from arxiv, 5 pages, 3 figures. Accepted for presentation at Interspeech 2026

Streaming speech enhancement requires balancing algorithmic latency against quality, yet existing approaches largely treat this as a binary causal versus non-causal choice. LaCo-SENet addresses this issue with two mechanisms parameterized by a single training-time hyperparameter. First, asymmetric temporal padding redistributes past and future context in convolutions, enabling systematic latency configuration. Second, dual-buffer streaming combines state buffers for past context with lookahead buffers that supply future context at both the input and feature levels. Selective state updates also prevent future-frame leakage into the streaming state, ensuring training-inference consistency. On VoiceBank+DEMAND, a fixed-budget (1.37M parameters) backbone yields a family of models spanning 12.5-75.0 ms, with PESQ rising from 3.35 to 3.43. At just 12.5 ms (fully causal), a PESQ of 3.35 matches or exceeds the prior causal state-of-the-art (3.27 at 46.5 ms).

翻译：暂无翻译

0

相关内容

【CVPR 2022】面向无噪声对象轮廓的弱监督语义分割，Towards Noiseless Object Contours for Weakly Supervised Semantic Segmentation

【CVPR 2022】面向无噪声对象轮廓的弱监督语义分割，Towards Noiseless Object Contours for Weakly Supervised Semantic Segmentation

专知会员服务

10+阅读 · 2022年3月12日

【CVPR2022】高分辨率和多样化的视频-文本预训练模型

【CVPR2022】高分辨率和多样化的视频-文本预训练模型

专知会员服务

10+阅读 · 2022年3月6日

多语言语音识别声学模型建模方法最新进展

多语言语音识别声学模型建模方法最新进展

专知会员服务

36+阅读 · 2022年2月7日

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

专知会员服务

17+阅读 · 2020年3月23日

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

专知会员服务

19+阅读 · 2020年2月26日

语音处理中的深度表示学习综述论文:挑战、最新进展和未来趋势，25页pdf

语音处理中的深度表示学习综述论文:挑战、最新进展和未来趋势，25页pdf

专知会员服务

32+阅读 · 2020年1月2日

Influence Maximization: Integrating and Expanding Classical Algorithms into the Social Network Context [陈卫微软亚洲研究院] 2019年中国计算机大会机器学习与数据挖掘论坛

Influence Maximization: Integrating and Expanding Classical Algorithms into the Social Network Context [陈卫微软亚洲研究院] 2019年中国计算机大会机器学习与数据挖掘论坛

专知会员服务

10+阅读 · 2019年10月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

白话attention综述（上）

白话attention综述（上）

AINLP

12+阅读 · 2019年12月14日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

近期语音类前沿论文

近期语音类前沿论文

深度学习每日摘要

14+阅读 · 2019年3月17日

深度学习中Attention Mechanism详细介绍：原理、分类及应用

深度学习中Attention Mechanism详细介绍：原理、分类及应用

深度学习与NLP

10+阅读 · 2019年2月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

《pyramid Attention Network for Semantic Segmentation》

《pyramid Attention Network for Semantic Segmentation》

统计学习与视觉计算组

44+阅读 · 2018年8月30日

Word2Vec —— 深度学习的一小步，自然语言处理的一大步

Word2Vec —— 深度学习的一小步，自然语言处理的一大步

AI研习社

21+阅读 · 2018年6月14日

模型汇总24 - 深度学习中Attention Mechanism详细介绍：原理、分类及应用

模型汇总24 - 深度学习中Attention Mechanism详细介绍：原理、分类及应用

深度学习与NLP

12+阅读 · 2017年11月30日

大尺度柔性平流层飞艇流固耦合动力学建模与定点保持控制

国家自然科学基金

1+阅读 · 2017年12月31日

有理小波理论在多途信号解析与水声网络设计中的应用研究

国家自然科学基金

0+阅读 · 2015年12月31日

磁流体选择填充微结构光纤的设计、制备及性能研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于柔性直流的多微网互联网络结构与运行集成的综合协调优化

国家自然科学基金

0+阅读 · 2015年12月31日

自适应快速模拟细节丰富的流体技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

可压缩多介质流体的真正多维高保真算法

国家自然科学基金

0+阅读 · 2015年12月31日

动态自适应的可伸缩视频流媒体组播编码-传输联合优化

国家自然科学基金

0+阅读 · 2015年12月31日

声-电-力耦合协同表面强化方法与机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

透明介质主冲击绝热线的高压声速连续测量

国家自然科学基金

0+阅读 · 2014年12月31日

集成化声表面结构固体板的耦合与相互作用特性研究

国家自然科学基金

0+阅读 · 2014年12月31日

When Does Streaming Tool Use Help? Characterizing Tool-Intent Stabilization in Streaming Retrieval-Augmented Generation

Arxiv

0+阅读 · 6月18日

ViCoStream: Streaming VideoLLMs Can Run Beyond 100 FPS with Stage-Wise Coordinated Inference

Arxiv

0+阅读 · 6月18日

QC-GAN: A Parameter-Efficient Quaternion Conformer GAN for High-Fidelity Speech Enhancement

Arxiv

0+阅读 · 6月18日

The ACUTE Protocol: Operationalizing Language Model Activations for Better Calibration, Utility, and Trust

Arxiv

0+阅读 · 6月17日

ALAS: An Automatic Latent Alignment Score for Audio Language Models

Arxiv

0+阅读 · 6月16日

Listening with Attention: Entropy-Guided Explainability for Transformer-Based Audio Models

Arxiv

0+阅读 · 6月12日

Feature-Aligned Speech Watermarking for Robustness to Reconstruction Distortions

Arxiv

0+阅读 · 6月10日

Continuous Audio Thinking for Large Audio Language Models

Arxiv

0+阅读 · 6月5日

Unlocking the Working Memory of Large Language Models for Latent Reasoning

Arxiv

0+阅读 · 5月28日

Latency in Real-Time 3D Volumetric Streaming: A Comprehensive Study

Arxiv

0+阅读 · 5月21日

VIP会员

文章信息

相关主题

Buffer（公司）

最新内容

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

专知会员服务

1+阅读 · 今天15:02

综述 | 3D场景图：开放挑战与未来方向

综述 | 3D场景图：开放挑战与未来方向

专知会员服务

1+阅读 · 今天15:00

《国防工业6.0：全自主作战系统、量子-人工智能融合与新一代战略威慑》

《国防工业6.0：全自主作战系统、量子-人工智能融合与新一代战略威慑》

专知会员服务

2+阅读 · 今天14:30

21世纪的无人机战争

21世纪的无人机战争

专知会员服务

2+阅读 · 今天14:05

《伊朗与以色列-美国热战及其对数字技术的影响》

《伊朗与以色列-美国热战及其对数字技术的影响》

专知会员服务

2+阅读 · 今天13:55

《量子技术的军事任务技术适配与利用》

《量子技术的军事任务技术适配与利用》

专知会员服务

2+阅读 · 今天13:51

《美国陆军军官学校（西点军校）本科生科研中生成式人工智能的使用》

《美国陆军军官学校（西点军校）本科生科研中生成式人工智能的使用》

专知会员服务

2+阅读 · 今天13:48

美国从乌克兰无人机战争中学习经验

美国从乌克兰无人机战争中学习经验

专知会员服务

7+阅读 · 6月21日

ICML 2026 | 面向视觉语言模型的语义鲁棒性认证

ICML 2026 | 面向视觉语言模型的语义鲁棒性认证

专知会员服务

5+阅读 · 6月21日

综述 | 智能体电子设计自动化：从“交接有效性”重新理解Agentic EDA

综述 | 智能体电子设计自动化：从“交接有效性”重新理解Agentic EDA

专知会员服务

7+阅读 · 6月21日

深入解读 Palantir AIP：全球最具争议的人工智能平台究竟如何运作

深入解读 Palantir AIP：全球最具争议的人工智能平台究竟如何运作

专知会员服务

20+阅读 · 6月20日

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

专知会员服务

5+阅读 · 6月19日

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

专知会员服务

8+阅读 · 6月19日

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

专知会员服务

7+阅读 · 6月18日

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

专知会员服务

9+阅读 · 6月18日

相关VIP内容

【CVPR 2022】面向无噪声对象轮廓的弱监督语义分割，Towards Noiseless Object Contours for Weakly Supervised Semantic Segmentation

【CVPR 2022】面向无噪声对象轮廓的弱监督语义分割，Towards Noiseless Object Contours for Weakly Supervised Semantic Segmentation

专知会员服务

10+阅读 · 2022年3月12日

【CVPR2022】高分辨率和多样化的视频-文本预训练模型

【CVPR2022】高分辨率和多样化的视频-文本预训练模型

专知会员服务

10+阅读 · 2022年3月6日

多语言语音识别声学模型建模方法最新进展

多语言语音识别声学模型建模方法最新进展

专知会员服务

36+阅读 · 2022年2月7日

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

专知会员服务

17+阅读 · 2020年3月23日

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

专知会员服务

19+阅读 · 2020年2月26日

语音处理中的深度表示学习综述论文:挑战、最新进展和未来趋势，25页pdf

语音处理中的深度表示学习综述论文:挑战、最新进展和未来趋势，25页pdf

专知会员服务

32+阅读 · 2020年1月2日

Influence Maximization: Integrating and Expanding Classical Algorithms into the Social Network Context [陈卫微软亚洲研究院] 2019年中国计算机大会机器学习与数据挖掘论坛

Influence Maximization: Integrating and Expanding Classical Algorithms into the Social Network Context [陈卫微软亚洲研究院] 2019年中国计算机大会机器学习与数据挖掘论坛

专知会员服务

10+阅读 · 2019年10月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

综述 | 3D场景图：开放挑战与未来方向

21世纪的无人机战争

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

《国防工业6.0：全自主作战系统、量子-人工智能融合与新一代战略威慑》

相关资讯

白话attention综述（上）

白话attention综述（上）

AINLP

12+阅读 · 2019年12月14日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

近期语音类前沿论文

近期语音类前沿论文

深度学习每日摘要

14+阅读 · 2019年3月17日

深度学习中Attention Mechanism详细介绍：原理、分类及应用

深度学习中Attention Mechanism详细介绍：原理、分类及应用

深度学习与NLP

10+阅读 · 2019年2月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

《pyramid Attention Network for Semantic Segmentation》

《pyramid Attention Network for Semantic Segmentation》

统计学习与视觉计算组

44+阅读 · 2018年8月30日

Word2Vec —— 深度学习的一小步，自然语言处理的一大步

Word2Vec —— 深度学习的一小步，自然语言处理的一大步

AI研习社

21+阅读 · 2018年6月14日

模型汇总24 - 深度学习中Attention Mechanism详细介绍：原理、分类及应用

模型汇总24 - 深度学习中Attention Mechanism详细介绍：原理、分类及应用

深度学习与NLP

12+阅读 · 2017年11月30日

相关论文

When Does Streaming Tool Use Help? Characterizing Tool-Intent Stabilization in Streaming Retrieval-Augmented Generation

Arxiv

0+阅读 · 6月18日

ViCoStream: Streaming VideoLLMs Can Run Beyond 100 FPS with Stage-Wise Coordinated Inference

Arxiv

0+阅读 · 6月18日

QC-GAN: A Parameter-Efficient Quaternion Conformer GAN for High-Fidelity Speech Enhancement

Arxiv

0+阅读 · 6月18日

The ACUTE Protocol: Operationalizing Language Model Activations for Better Calibration, Utility, and Trust

Arxiv

0+阅读 · 6月17日

ALAS: An Automatic Latent Alignment Score for Audio Language Models

Arxiv

0+阅读 · 6月16日

Listening with Attention: Entropy-Guided Explainability for Transformer-Based Audio Models

Arxiv

0+阅读 · 6月12日

Feature-Aligned Speech Watermarking for Robustness to Reconstruction Distortions

Arxiv

0+阅读 · 6月10日

Continuous Audio Thinking for Large Audio Language Models

Arxiv

0+阅读 · 6月5日

Unlocking the Working Memory of Large Language Models for Latent Reasoning

Arxiv

0+阅读 · 5月28日

Latency in Real-Time 3D Volumetric Streaming: A Comprehensive Study

Arxiv

0+阅读 · 5月21日

相关基金

大尺度柔性平流层飞艇流固耦合动力学建模与定点保持控制

国家自然科学基金

1+阅读 · 2017年12月31日

有理小波理论在多途信号解析与水声网络设计中的应用研究

国家自然科学基金

0+阅读 · 2015年12月31日

磁流体选择填充微结构光纤的设计、制备及性能研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于柔性直流的多微网互联网络结构与运行集成的综合协调优化

国家自然科学基金

0+阅读 · 2015年12月31日

自适应快速模拟细节丰富的流体技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

可压缩多介质流体的真正多维高保真算法

国家自然科学基金

0+阅读 · 2015年12月31日

动态自适应的可伸缩视频流媒体组播编码-传输联合优化

国家自然科学基金

0+阅读 · 2015年12月31日

声-电-力耦合协同表面强化方法与机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

透明介质主冲击绝热线的高压声速连续测量

国家自然科学基金

0+阅读 · 2014年12月31日

集成化声表面结构固体板的耦合与相互作用特性研究

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员