关注我所说的内容：突出幻灯片中的相关内容 (Attend to what I say: Highlighting relevant content on slides) - 专知论文

会员服务 ·

0

演讲 · 演示 · 识别 · 会议 · 分析 ·

Attend to what I say: Highlighting relevant content on slides

翻译：关注我所说的内容：突出幻灯片中的相关内容

Megha Mariam K M,C. V. Jawahar

from arxiv, Accepted at the International Conference on Document Analysis and Recognition (ICDAR) 2025

Imagine sitting in a presentation, trying to follow the speaker while simultaneously scanning the slides for relevant information. While the entire slide is visible, identifying the relevant regions can be challenging. As you focus on one part of the slide, the speaker moves on to a new sentence, leaving you scrambling to catch up visually. This constant back-and-forth creates a disconnect between what is being said and the most important visual elements, making it hard to absorb key details, especially in fast-paced or content-heavy presentations such as conference talks. This requires an understanding of slides, including text, graphics, and layout. We introduce a method that automatically identifies and highlights the most relevant slide regions based on the speaker's narrative. By analyzing spoken content and matching it with textual or graphical elements in the slides, our approach ensures better synchronization between what listeners hear and what they need to attend to. We explore different ways of solving this problem and assess their success and failure cases. Analyzing multimedia documents is emerging as a key requirement for seamless understanding of content-rich videos, such as educational videos and conference talks, by reducing cognitive strain and improving comprehension. Code and dataset are available at: https://github.com/meghamariamkm2002/Slide_Highlight

翻译：想象一下，您正坐在一场演示中，一边试图跟上演讲者的节奏，一边同时扫描幻灯片以寻找相关信息。虽然整个幻灯片都清晰可见，但识别出相关区域却可能颇具挑战性。当您专注于幻灯片的某一部分时，演讲者可能已经转向了新的句子，迫使您手忙脚乱地追赶视觉信息。这种持续的来回切换造成了所听内容与最重要视觉元素之间的脱节，使得听众难以吸收关键细节，尤其是在快节奏或内容密集的演示中，例如会议报告。这需要理解幻灯片的内容，包括文本、图形和布局。我们提出了一种方法，能够根据演讲者的叙述自动识别并突出显示幻灯片中最相关的区域。通过分析口语内容并将其与幻灯片中的文本或图形元素进行匹配，我们的方法确保了听众听到的内容与他们需要关注的内容之间更好的同步。我们探讨了解决此问题的不同方法，并评估了它们的成功与失败案例。通过减少认知负荷并提高理解力，分析多媒体文档正逐渐成为无缝理解内容丰富视频（如教育视频和会议报告）的关键需求。代码和数据集可在以下网址获取：https://github.com/meghamariamkm2002/Slide_Highlight

0

相关内容

演讲又叫讲演或演说，是指在公众场所，以有声语言为主要手段，以体态语言为辅助手段，针对某个学术科技问题，完整地发表自己的见解和主张。

视觉中怎么用提示？南洋理工CVPR2023《视觉提示》教程，附290页ppt

视觉中怎么用提示？南洋理工CVPR2023《视觉提示》教程，附290页ppt

专知会员服务

82+阅读 · 2023年6月30日

【AAAI2022】不用做PPT了！论文自动搞成PPT，层次序列到序列建模自动生成论文PPT

【AAAI2022】不用做PPT了！论文自动搞成PPT，层次序列到序列建模自动生成论文PPT

专知会员服务

50+阅读 · 2021年12月2日

清华&南开最新「视觉注意力机制Attention」综述论文，带你全面了解六大类注意力机制方法

清华&南开最新「视觉注意力机制Attention」综述论文，带你全面了解六大类注意力机制方法

专知会员服务

99+阅读 · 2021年11月20日

最新《图像描述Image Captioning》综述论文，22页pdf220篇文献

专知会员服务

43+阅读 · 2021年7月17日

【CVPR2020-斯坦福】知识蒸馏时空图的视频描述，Spatio-Temporal Graph

【CVPR2020-斯坦福】知识蒸馏时空图的视频描述，Spatio-Temporal Graph

专知会员服务

34+阅读 · 2020年4月2日

【CVPR2020-亚马逊】后向兼容表示学习，BackwardCompatible RepresentationLearning

【CVPR2020-亚马逊】后向兼容表示学习，BackwardCompatible RepresentationLearning

专知会员服务

13+阅读 · 2020年3月27日

【斯坦福大学】场景图谱表示在计算机视觉中的应用，41页ppt

【斯坦福大学】场景图谱表示在计算机视觉中的应用，41页ppt

专知会员服务

52+阅读 · 2020年1月8日

【NeurlPS2019论文总结】它是这样的:用于可解释图像识别的深度学习，This Looks Like That: Deep Learning for Interpretable Image Recognition

【NeurlPS2019论文总结】它是这样的:用于可解释图像识别的深度学习，This Looks Like That: Deep Learning for Interpretable Image Recognition

专知会员服务

22+阅读 · 2019年12月17日

【AAAI2020论文】关注实体以更好地理解文本（Attending to Entities for Better Text Understanding）

【AAAI2020论文】关注实体以更好地理解文本（Attending to Entities for Better Text Understanding）

专知会员服务

25+阅读 · 2019年11月15日

注意力机制介绍，Attention Mechanism

注意力机制介绍，Attention Mechanism

专知会员服务

172+阅读 · 2019年10月13日

白话attention综述（上）

白话attention综述（上）

AINLP

12+阅读 · 2019年12月14日

NLP+CV《桥接视觉与语言的研究综述》，带你全面了解视觉+语言最新应用和方法

NLP+CV《桥接视觉与语言的研究综述》，带你全面了解视觉+语言最新应用和方法

中国人工智能学会

27+阅读 · 2019年7月24日

CVPR2019 | 港中文&腾讯优图等提出：暗光下的图像增强

CVPR2019 | 港中文&腾讯优图等提出：暗光下的图像增强

极市平台

15+阅读 · 2019年6月5日

长文本表示学习概述

长文本表示学习概述

云栖社区

15+阅读 · 2019年5月9日

图像和文本的融合表示学习——Text2Image和Image2Text

图像和文本的融合表示学习——Text2Image和Image2Text

专知

125+阅读 · 2018年6月11日

跨越注意力：Cross-Attention

跨越注意力：Cross-Attention

我爱读PAMI

172+阅读 · 2018年6月2日

自适应注意力机制在Image Caption中的应用

自适应注意力机制在Image Caption中的应用

PaperWeekly

10+阅读 · 2018年5月10日

【论文推荐】最新七篇视觉问答（VQA）相关论文—差别注意力机制、视觉问题推理、视觉对话、数据可视化、记忆增强网络、显式推理

【论文推荐】最新七篇视觉问答（VQA）相关论文—差别注意力机制、视觉问题推理、视觉对话、数据可视化、记忆增强网络、显式推理

专知

17+阅读 · 2018年4月19日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

CycleGAN：图片风格，想换就换 | ICCV 2017论文解读

CycleGAN：图片风格，想换就换 | ICCV 2017论文解读

PaperWeekly

12+阅读 · 2018年3月14日

微博中定向话题发现与追踪

国家自然科学基金

0+阅读 · 2015年12月31日

基于知识库构建的图像和视频角色语义关系的研究

国家自然科学基金

1+阅读 · 2015年12月31日

视觉信息的局部特征表示及应用研究

国家自然科学基金

2+阅读 · 2015年12月31日

视知觉学习中的脑功能网络变化及其与学习效果的关系

国家自然科学基金

0+阅读 · 2015年12月31日

融合稀疏层次模型的内容辨识研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于犹豫模糊语言信息的定性决策理论与方法

国家自然科学基金

2+阅读 · 2015年12月31日

基于视觉差异特征的跨域图像匹配方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

强调与对比影响语篇理解的认知过程及其神经机制

国家自然科学基金

4+阅读 · 2015年12月31日

深度学习框架下基于情境线索的视觉注意研究

国家自然科学基金

2+阅读 · 2015年12月31日

知觉学习影响视觉刺激显著性的神经机制

国家自然科学基金

1+阅读 · 2015年12月31日

Watch and Listen: Understanding Audio-Visual-Speech Moments with Multimodal LLM

Arxiv

0+阅读 · 2月2日

LLM-based Embeddings: Attention Values Encode Sentence Semantics Better Than Hidden States

Arxiv

0+阅读 · 2月2日

Talk to Me, Not the Slides: A Real-Time Wearable Assistant for Improving Eye Contact in Presentations

Arxiv

0+阅读 · 2月1日

Spotlighting Task-Relevant Features: Object-Centric Representations for Better Generalization in Robotic Manipulation

Arxiv

0+阅读 · 1月29日

SymbolSight: Minimizing Inter-Symbol Interference for Reading with Prosthetic Vision

Arxiv

0+阅读 · 1月24日

Insight: Interpretable Semantic Hierarchies in Vision-Language Encoders

Arxiv

0+阅读 · 1月20日

DiSCo: Making Absence Visible in Intelligent Summarization Interfaces

Arxiv

0+阅读 · 1月12日

LooseRoPE: Content-aware Attention Manipulation for Semantic Harmonization

Arxiv

0+阅读 · 1月8日

ITSELF: Attention Guided Fine-Grained Alignment for Vision-Language Retrieval

Arxiv

0+阅读 · 1月3日

Question-controlled Text-aware Image Captioning

Arxiv

10+阅读 · 2021年8月4日

VIP会员

文章信息

相关主题

相关VIP内容

视觉中怎么用提示？南洋理工CVPR2023《视觉提示》教程，附290页ppt

视觉中怎么用提示？南洋理工CVPR2023《视觉提示》教程，附290页ppt

专知会员服务

82+阅读 · 2023年6月30日

【AAAI2022】不用做PPT了！论文自动搞成PPT，层次序列到序列建模自动生成论文PPT

【AAAI2022】不用做PPT了！论文自动搞成PPT，层次序列到序列建模自动生成论文PPT

专知会员服务

50+阅读 · 2021年12月2日

清华&南开最新「视觉注意力机制Attention」综述论文，带你全面了解六大类注意力机制方法

清华&南开最新「视觉注意力机制Attention」综述论文，带你全面了解六大类注意力机制方法

专知会员服务

99+阅读 · 2021年11月20日

最新《图像描述Image Captioning》综述论文，22页pdf220篇文献

专知会员服务

43+阅读 · 2021年7月17日

【CVPR2020-斯坦福】知识蒸馏时空图的视频描述，Spatio-Temporal Graph

【CVPR2020-斯坦福】知识蒸馏时空图的视频描述，Spatio-Temporal Graph

专知会员服务

34+阅读 · 2020年4月2日

【CVPR2020-亚马逊】后向兼容表示学习，BackwardCompatible RepresentationLearning

【CVPR2020-亚马逊】后向兼容表示学习，BackwardCompatible RepresentationLearning

专知会员服务

13+阅读 · 2020年3月27日

【斯坦福大学】场景图谱表示在计算机视觉中的应用，41页ppt

【斯坦福大学】场景图谱表示在计算机视觉中的应用，41页ppt

专知会员服务

52+阅读 · 2020年1月8日

【NeurlPS2019论文总结】它是这样的:用于可解释图像识别的深度学习，This Looks Like That: Deep Learning for Interpretable Image Recognition

【NeurlPS2019论文总结】它是这样的:用于可解释图像识别的深度学习，This Looks Like That: Deep Learning for Interpretable Image Recognition

专知会员服务

22+阅读 · 2019年12月17日

【AAAI2020论文】关注实体以更好地理解文本（Attending to Entities for Better Text Understanding）

【AAAI2020论文】关注实体以更好地理解文本（Attending to Entities for Better Text Understanding）

专知会员服务

25+阅读 · 2019年11月15日

注意力机制介绍，Attention Mechanism

注意力机制介绍，Attention Mechanism

专知会员服务

172+阅读 · 2019年10月13日

热门VIP内容

开通专知VIP会员享更多权益服务

《人工智能、武器与影响力：前沿模型在模拟核危机中展现复杂推理》2026最新46页报告

AI智能体时代大模型安全风险与攻防新挑战

《无人机与战争：被忽视的环境影响及无人机保护潜力》

俄罗斯规划未来无人机驱动军队

相关资讯

白话attention综述（上）

白话attention综述（上）

AINLP

12+阅读 · 2019年12月14日

NLP+CV《桥接视觉与语言的研究综述》，带你全面了解视觉+语言最新应用和方法

NLP+CV《桥接视觉与语言的研究综述》，带你全面了解视觉+语言最新应用和方法

中国人工智能学会

27+阅读 · 2019年7月24日

CVPR2019 | 港中文&腾讯优图等提出：暗光下的图像增强

CVPR2019 | 港中文&腾讯优图等提出：暗光下的图像增强

极市平台

15+阅读 · 2019年6月5日

长文本表示学习概述

长文本表示学习概述

云栖社区

15+阅读 · 2019年5月9日

图像和文本的融合表示学习——Text2Image和Image2Text

图像和文本的融合表示学习——Text2Image和Image2Text

专知

125+阅读 · 2018年6月11日

跨越注意力：Cross-Attention

跨越注意力：Cross-Attention

我爱读PAMI

172+阅读 · 2018年6月2日

自适应注意力机制在Image Caption中的应用

自适应注意力机制在Image Caption中的应用

PaperWeekly

10+阅读 · 2018年5月10日

【论文推荐】最新七篇视觉问答（VQA）相关论文—差别注意力机制、视觉问题推理、视觉对话、数据可视化、记忆增强网络、显式推理

【论文推荐】最新七篇视觉问答（VQA）相关论文—差别注意力机制、视觉问题推理、视觉对话、数据可视化、记忆增强网络、显式推理

专知

17+阅读 · 2018年4月19日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

CycleGAN：图片风格，想换就换 | ICCV 2017论文解读

CycleGAN：图片风格，想换就换 | ICCV 2017论文解读

PaperWeekly

12+阅读 · 2018年3月14日

相关论文

Watch and Listen: Understanding Audio-Visual-Speech Moments with Multimodal LLM

Arxiv

0+阅读 · 2月2日

LLM-based Embeddings: Attention Values Encode Sentence Semantics Better Than Hidden States

Arxiv

0+阅读 · 2月2日

Talk to Me, Not the Slides: A Real-Time Wearable Assistant for Improving Eye Contact in Presentations

Arxiv

0+阅读 · 2月1日

Spotlighting Task-Relevant Features: Object-Centric Representations for Better Generalization in Robotic Manipulation

Arxiv

0+阅读 · 1月29日

SymbolSight: Minimizing Inter-Symbol Interference for Reading with Prosthetic Vision

Arxiv

0+阅读 · 1月24日

Insight: Interpretable Semantic Hierarchies in Vision-Language Encoders

Arxiv

0+阅读 · 1月20日

DiSCo: Making Absence Visible in Intelligent Summarization Interfaces

Arxiv

0+阅读 · 1月12日

LooseRoPE: Content-aware Attention Manipulation for Semantic Harmonization

Arxiv

0+阅读 · 1月8日

ITSELF: Attention Guided Fine-Grained Alignment for Vision-Language Retrieval

Arxiv

0+阅读 · 1月3日

Question-controlled Text-aware Image Captioning

Arxiv

10+阅读 · 2021年8月4日

相关基金

微博中定向话题发现与追踪

国家自然科学基金

0+阅读 · 2015年12月31日

基于知识库构建的图像和视频角色语义关系的研究

国家自然科学基金

1+阅读 · 2015年12月31日

视觉信息的局部特征表示及应用研究

国家自然科学基金

2+阅读 · 2015年12月31日

视知觉学习中的脑功能网络变化及其与学习效果的关系

国家自然科学基金

0+阅读 · 2015年12月31日

融合稀疏层次模型的内容辨识研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于犹豫模糊语言信息的定性决策理论与方法

国家自然科学基金

2+阅读 · 2015年12月31日

基于视觉差异特征的跨域图像匹配方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

强调与对比影响语篇理解的认知过程及其神经机制

国家自然科学基金

4+阅读 · 2015年12月31日

深度学习框架下基于情境线索的视觉注意研究

国家自然科学基金

2+阅读 · 2015年12月31日

知觉学习影响视觉刺激显著性的神经机制

国家自然科学基金

1+阅读 · 2015年12月31日

微信扫码咨询专知VIP会员