Improving Medical Speech-to-Text Accuracy with Vision-Language Pre-training Model - 专知论文

会员服务 ·

0

INFORMS · 语音识别 · 可理解性 · 模型评估 · MoDELS ·

2023 年 2 月 27 日

Improving Medical Speech-to-Text Accuracy with Vision-Language Pre-training Model

翻译：利用视觉-语言预训练模型提升医疗语音转文字准确率

Jaeyoung Huh,Sangjoon Park,Jeong Eun Lee,Jong Chul Ye

Automatic Speech Recognition (ASR) is a technology that converts spoken words into text, facilitating interaction between humans and machines. One of the most common applications of ASR is Speech-To-Text (STT) technology, which simplifies user workflows by transcribing spoken words into text. In the medical field, STT has the potential to significantly reduce the workload of clinicians who rely on typists to transcribe their voice recordings. However, developing an STT model for the medical domain is challenging due to the lack of sufficient speech and text datasets. To address this issue, we propose a medical-domain text correction method that modifies the output text of a general STT system using the Vision Language Pre-training (VLP) method. VLP combines textual and visual information to correct text based on image knowledge. Our extensive experiments demonstrate that the proposed method offers quantitatively and clinically significant improvements in STT performance in the medical field. We further show that multi-modal understanding of image and text information outperforms single-modal understanding using only text information.

翻译：自动语音识别（ASR）是一种将语音转换为文本的技术，能够促进人机交互。ASR最常见的应用之一就是语音转文字（STT）技术，它通过将语音转录为文本来简化用户工作流程。在医疗领域，STT有望显著减轻依赖打字员转录语音记录的临床医生的工作负担。然而，由于缺乏足够的语音和文本数据集，开发面向医疗领域的STT模型具有挑战性。为了解决这一问题，我们提出了一种医疗领域的文本校正方法，该方法利用视觉-语言预训练（VLP）技术对通用STT系统的输出文本进行修改。VLP结合文本与视觉信息，基于图像知识对文本进行校正。大量实验表明，所提出的方法在定量指标和临床意义上均显著提升了医疗领域的STT性能。我们进一步证明，基于图像和文本信息的多模态理解优于仅基于文本信息的单模态理解。

0

相关内容

INFORMS

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

60+阅读 · 2020年3月14日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

94+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新十二篇情感分析相关论文—自然语言推理框架、网络事件、多任务学习、实时情感变化检测、多因素分析、深度语境词表示

【论文推荐】最新十二篇情感分析相关论文—自然语言推理框架、网络事件、多任务学习、实时情感变化检测、多因素分析、深度语境词表示

专知

22+阅读 · 2018年5月7日

【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

专知

29+阅读 · 2018年3月12日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

牛磺酸抑制AS肉鸡右心肥大过程中calpains介导细胞凋亡作用的研究

国家自然科学基金

0+阅读 · 2015年12月31日

Ago2磷酸化在DNA损伤修复中的功能分析

国家自然科学基金

0+阅读 · 2014年12月31日

“核HO-1”调控miRNA-125a-5p影响血脊髓屏障结构和功能的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

AMPK-Beclin-1/Vps34通路在维生素D3（Vit D)诱导足细胞自噬中的作用和机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

衰老小鼠线粒体促凋亡蛋白Omi/HtrA2表达增加在加重帕金森病中的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

靶向LDH-A能量代谢对T细胞急性淋巴细胞白血病的抗白血病效应及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

缺氧诱导NHE1参与calpain介导ABCA1降解及胆固醇逆转运障碍

国家自然科学基金

0+阅读 · 2012年12月31日

生殖干细胞专有基因PIWIL2调节miR-10a抑制肿瘤及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Fbxw8对上皮间质转化（EMT）的调控及其在前列腺癌转移中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

PI-IBS中TMEM16A介导IL-4对Cajal细胞损伤的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

Enhancing object detection robustness: A synthetic and natural perturbation approach

Arxiv

0+阅读 · 2023年4月20日

Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning

Arxiv

11+阅读 · 2023年3月10日

Medical Visual Question Answering: A Survey

Arxiv

15+阅读 · 2021年11月19日

Contrastive learning of global and local features for medical image segmentation with limited annotations

Arxiv

19+阅读 · 2020年6月18日

Learning Conceptual-Contextual Embeddings for Medical Text

Arxiv

15+阅读 · 2020年3月12日

TinyBERT: Distilling BERT for Natural Language Understanding

TinyBERT: Distilling BERT for Natural Language Understanding

Arxiv

11+阅读 · 2019年9月23日

Learning Conceptual-Contexual Embeddings for Medical Text

Arxiv

28+阅读 · 2019年8月16日

Text Generation from Knowledge Graphs with Graph Transformers

Arxiv

35+阅读 · 2019年4月4日

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Arxiv

15+阅读 · 2018年10月11日

Audio Adversarial Examples: Targeted Attacks on Speech-to-Text

Arxiv

18+阅读 · 2018年1月5日

VIP会员

文章信息

相关主题

最新内容

《远程自主系统可扩展态势感知的解决方案》32页2026最新报告

《远程自主系统可扩展态势感知的解决方案》32页2026最新报告

专知会员服务

4+阅读 · 7月23日

《基于强化学习的自动化红队测试》

《基于强化学习的自动化红队测试》

专知会员服务

3+阅读 · 7月23日

《下一代无人机-卫星通信：人工智能创新与未来展望》32页长综述

《下一代无人机-卫星通信：人工智能创新与未来展望》32页长综述

专知会员服务

6+阅读 · 7月23日

“天降毒雾”：无人机如何使化学战重返乌克兰战场

“天降毒雾”：无人机如何使化学战重返乌克兰战场

专知会员服务

2+阅读 · 7月23日

伊朗不对称防空战略的演进

伊朗不对称防空战略的演进

专知会员服务

4+阅读 · 7月23日

对抗环境下超视距目标打击的情报支援

对抗环境下超视距目标打击的情报支援

专知会员服务

10+阅读 · 7月22日

《面向复杂地形下无人机跟踪地面机器人（UAV–UGV）的自适应多滤波器扩展卡尔曼滤波框架》

《面向复杂地形下无人机跟踪地面机器人（UAV–UGV）的自适应多滤波器扩展卡尔曼滤波框架》

专知会员服务

4+阅读 · 7月22日

纵深侦察：大规模作战行动中远程侦察与监视之迫切需求

纵深侦察：大规模作战行动中远程侦察与监视之迫切需求

专知会员服务

8+阅读 · 7月22日

共享认知，分布式研判：复杂行动中的美国空军指挥控制（万字长文）

共享认知，分布式研判：复杂行动中的美国空军指挥控制（万字长文）

专知会员服务

10+阅读 · 7月22日

《无人机对海面作战影响评估》

《无人机对海面作战影响评估》

专知会员服务

15+阅读 · 7月21日

《可损耗无人系统规模化应用对美国军事转型的战略影响（2022-2030）》2026年270页

《可损耗无人系统规模化应用对美国军事转型的战略影响（2022-2030）》2026年270页

专知会员服务

14+阅读 · 7月21日

博士论文 | 后训练如何损害大模型生成多样性？SimpleStrat与Stylus

博士论文 | 后训练如何损害大模型生成多样性？SimpleStrat与Stylus

专知会员服务

4+阅读 · 7月21日

综述 | 面向5G/6G网络的LLM智能体AI：架构、协议与标准化

综述 | 面向5G/6G网络的LLM智能体AI：架构、协议与标准化

专知会员服务

6+阅读 · 7月21日

五角大楼新设无人机办公室（DRPM-UxS）将如何重塑美国无人系统格局（附美国防部设立备忘录）

五角大楼新设无人机办公室（DRPM-UxS）将如何重塑美国无人系统格局（附美国防部设立备忘录）

专知会员服务

9+阅读 · 7月21日

印度精确打击与指挥架构的断层

印度精确打击与指挥架构的断层

专知会员服务

7+阅读 · 7月20日

相关VIP内容

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

60+阅读 · 2020年3月14日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

94+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《基于强化学习的自动化红队测试》

“天降毒雾”：无人机如何使化学战重返乌克兰战场

《远程自主系统可扩展态势感知的解决方案》32页2026最新报告

《下一代无人机-卫星通信：人工智能创新与未来展望》32页长综述

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新十二篇情感分析相关论文—自然语言推理框架、网络事件、多任务学习、实时情感变化检测、多因素分析、深度语境词表示

【论文推荐】最新十二篇情感分析相关论文—自然语言推理框架、网络事件、多任务学习、实时情感变化检测、多因素分析、深度语境词表示

专知

22+阅读 · 2018年5月7日

【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

专知

29+阅读 · 2018年3月12日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Enhancing object detection robustness: A synthetic and natural perturbation approach

Arxiv

0+阅读 · 2023年4月20日

Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning

Arxiv

11+阅读 · 2023年3月10日

Medical Visual Question Answering: A Survey

Arxiv

15+阅读 · 2021年11月19日

Contrastive learning of global and local features for medical image segmentation with limited annotations

Arxiv

19+阅读 · 2020年6月18日

Learning Conceptual-Contextual Embeddings for Medical Text

Arxiv

15+阅读 · 2020年3月12日

TinyBERT: Distilling BERT for Natural Language Understanding

TinyBERT: Distilling BERT for Natural Language Understanding

Arxiv

11+阅读 · 2019年9月23日

Learning Conceptual-Contexual Embeddings for Medical Text

Arxiv

28+阅读 · 2019年8月16日

Text Generation from Knowledge Graphs with Graph Transformers

Arxiv

35+阅读 · 2019年4月4日

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Arxiv

15+阅读 · 2018年10月11日

Audio Adversarial Examples: Targeted Attacks on Speech-to-Text

Arxiv

18+阅读 · 2018年1月5日

相关基金

牛磺酸抑制AS肉鸡右心肥大过程中calpains介导细胞凋亡作用的研究

国家自然科学基金

0+阅读 · 2015年12月31日

Ago2磷酸化在DNA损伤修复中的功能分析

国家自然科学基金

0+阅读 · 2014年12月31日

“核HO-1”调控miRNA-125a-5p影响血脊髓屏障结构和功能的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

AMPK-Beclin-1/Vps34通路在维生素D3（Vit D)诱导足细胞自噬中的作用和机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

衰老小鼠线粒体促凋亡蛋白Omi/HtrA2表达增加在加重帕金森病中的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

靶向LDH-A能量代谢对T细胞急性淋巴细胞白血病的抗白血病效应及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

缺氧诱导NHE1参与calpain介导ABCA1降解及胆固醇逆转运障碍

国家自然科学基金

0+阅读 · 2012年12月31日

生殖干细胞专有基因PIWIL2调节miR-10a抑制肿瘤及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Fbxw8对上皮间质转化（EMT）的调控及其在前列腺癌转移中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

PI-IBS中TMEM16A介导IL-4对Cajal细胞损伤的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员