Style-Aware Contrastive Learning for Multi-Style Image Captioning - 专知论文

会员服务 ·

0

图像字幕 · contrastive · Learning · 对比学习 · Extensibility ·

2023 年 1 月 26 日

Style-Aware Contrastive Learning for Multi-Style Image Captioning

翻译：面向多风格图像描述的样式感知对比学习

Yucheng Zhou,Guodong Long

from arxiv, Findings of EACL 2023

Existing multi-style image captioning methods show promising results in generating a caption with accurate visual content and desired linguistic style. However, existing methods overlook the relationship between linguistic style and visual content. To overcome this drawback, we propose style-aware contrastive learning for multi-style image captioning. First, we present a style-aware visual encoder with contrastive learning to mine potential visual content relevant to style. Moreover, we propose a style-aware triplet contrast objective to distinguish whether the image, style and caption matched. To provide positive and negative samples for contrastive learning, we present three retrieval schemes: object-based retrieval, RoI-based retrieval and triplet-based retrieval, and design a dynamic trade-off function to calculate retrieval scores. Experimental results demonstrate that our approach achieves state-of-the-art performance. In addition, we conduct an extensive analysis to verify the effectiveness of our method.

翻译：现有面向多风格图像描述的方法在生成兼具准确视觉内容与期望语言风格的描述方面已取得显著成效。然而，现有方法忽视了语言风格与视觉内容之间的关联性。为解决这一缺陷，我们提出面向多风格图像描述的样式感知对比学习方法。首先，设计基于对比学习的样式感知视觉编码器，用于挖掘与风格相关的潜在视觉内容。此外，提出样式感知三元组对比目标，以判别图像、风格与描述三者是否匹配。为提供对比学习所需的正负样本，我们提出三种检索方案：基于目标的检索、基于RoI的检索及基于三元组的检索，并设计动态权衡函数计算检索得分。实验结果表明，本方法实现了最先进的性能。同时，通过广泛分析验证了所提方法的有效性。

0

相关内容

图像字幕

图像字幕（Image Captioning）,是指从图像生成文本描述的过程，主要根据图像中物体和物体的动作。

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

专知会员服务

54+阅读 · 2020年3月3日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新八篇图像检索相关论文—三元组、深度特征图、判别式、卷积特征聚合、视觉-关系知识图谱、大规模图像检索

【论文推荐】最新八篇图像检索相关论文—三元组、深度特征图、判别式、卷积特征聚合、视觉-关系知识图谱、大规模图像检索

专知

33+阅读 · 2018年4月23日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

多智能体系统有限时间一致性的自适应控制研究

国家自然科学基金

3+阅读 · 2015年12月31日

地球化学特征对太湖底泥中多氯联苯微生物厌氧脱氯的影响研究

国家自然科学基金

0+阅读 · 2013年12月31日

lncRNAs和miR-592的相互作用对mESC向神经元分化的影响

国家自然科学基金

0+阅读 · 2012年12月31日

lncRNA BC098786和BC088259调控坐骨神经再生的机制

国家自然科学基金

0+阅读 · 2012年12月31日

LIMK1：罗格列酮抑制人胃癌细胞增殖、迁移及侵袭的作用靶点

国家自然科学基金

0+阅读 · 2012年12月31日

藏药“甲嘎松汤”调节Nrf2/Bach1-ARE-抗氧化酶通路以抗氧化保肝的研究

国家自然科学基金

0+阅读 · 2012年12月31日

抑癌基因PDCD4调控miR-184和miR-374a抑制鼻咽癌生长及促进凋亡

国家自然科学基金

0+阅读 · 2012年12月31日

先天性巨结肠发生新机制：miR-218-1介导肠神经嵴细胞迁移和增殖障碍

国家自然科学基金

0+阅读 · 2012年12月31日

miR-140在肿瘤转移中的作用及机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

神经元凋亡时Egr1对BH3-only蛋白Bim的转录调控

国家自然科学基金

0+阅读 · 2009年12月31日

Multi-modal reward for visual relationships-based image captioning

Arxiv

0+阅读 · 2023年3月19日

Cross-Dimensional Refined Learning for Real-Time 3D Visual Perception from Monocular Video

Arxiv

0+阅读 · 2023年3月16日

From Show to Tell: A Survey on Image Captioning

Arxiv

15+阅读 · 2021年7月14日

Deep Image Retrieval: A Survey

Arxiv

16+阅读 · 2021年1月27日

Contrastive learning of global and local features for medical image segmentation with limited annotations

Arxiv

19+阅读 · 2020年6月18日

A survey on deep hashing for image retrieval

A survey on deep hashing for image retrieval

Arxiv

15+阅读 · 2020年6月10日

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Arxiv

19+阅读 · 2020年3月31日

Image Segmentation Using Deep Learning: A Survey

Image Segmentation Using Deep Learning: A Survey

Arxiv

47+阅读 · 2020年1月15日

Image Captioning

Arxiv

11+阅读 · 2018年5月13日

End-to-End Dense Video Captioning with Masked Transformer

Arxiv

14+阅读 · 2018年4月3日

VIP会员

文章信息

相关主题

最新内容

美国马六甲“三重网”概念：安全网、威慑网与杀伤网

美国马六甲“三重网”概念：安全网、威慑网与杀伤网

专知会员服务

1+阅读 · 今天8:18

《面向导弹有效发射时机的监督机器学习方法：基于超视距空战仿真》

《面向导弹有效发射时机的监督机器学习方法：基于超视距空战仿真》

专知会员服务

1+阅读 · 今天7:39

《通用大语言模型：无人机指挥与控制接口》最新40页

《通用大语言模型：无人机指挥与控制接口》最新40页

专知会员服务

4+阅读 · 今天7:33

《通过小型无人机系统将情报能力“作战化”》

《通过小型无人机系统将情报能力“作战化”》

专知会员服务

1+阅读 · 今天7:28

《神经安全型有人–无人协同：面向认知自适应作战能力的参考架构》

《神经安全型有人–无人协同：面向认知自适应作战能力的参考架构》

专知会员服务

1+阅读 · 今天7:14

《在指挥链中通过多准则决策分析传达指挥官意图：空战实验》

《在指挥链中通过多准则决策分析传达指挥官意图：空战实验》

专知会员服务

17+阅读 · 6月15日

消耗优势：美军的“精确规模化”概念

消耗优势：美军的“精确规模化”概念

专知会员服务

7+阅读 · 6月15日

五角大楼的AI优先战略及其对现代战争的启示：来自与伊朗冲突的经验教训

五角大楼的AI优先战略及其对现代战争的启示：来自与伊朗冲突的经验教训

专知会员服务

8+阅读 · 6月15日

《网络空间兵棋推演：挑战、局限性与混合路径》报告

《网络空间兵棋推演：挑战、局限性与混合路径》报告

专知会员服务

8+阅读 · 6月15日

《离线语言支持系统：面向空战战术决策》

《离线语言支持系统：面向空战战术决策》

专知会员服务

8+阅读 · 6月15日

《以通信为中心的6G–LLM架构：面向可扩展的战术自主防御车辆网络》

《以通信为中心的6G–LLM架构：面向可扩展的战术自主防御车辆网络》

专知会员服务

6+阅读 · 6月15日

ICML 2026｜ECA：面向开放式图文生成的高效持续对齐

ICML 2026｜ECA：面向开放式图文生成的高效持续对齐

专知会员服务

6+阅读 · 6月14日

可信智能体AI综述：安全、鲁棒性、隐私与系统安全

可信智能体AI综述：安全、鲁棒性、隐私与系统安全

专知会员服务

6+阅读 · 6月14日

俄乌战场地面机器人如何改写战争规则

俄乌战场地面机器人如何改写战争规则

专知会员服务

9+阅读 · 6月14日

美国海军研究生院第23届年度采购研究研讨会与创新峰会：主题“加速作战能力”，附会议报告论文集1300页

美国海军研究生院第23届年度采购研究研讨会与创新峰会：主题“加速作战能力”，附会议报告论文集1300页

专知会员服务

13+阅读 · 6月14日

相关VIP内容

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

专知会员服务

54+阅读 · 2020年3月3日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《面向导弹有效发射时机的监督机器学习方法：基于超视距空战仿真》

《通过小型无人机系统将情报能力“作战化”》

美国马六甲“三重网”概念：安全网、威慑网与杀伤网

《通用大语言模型：无人机指挥与控制接口》最新40页

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新八篇图像检索相关论文—三元组、深度特征图、判别式、卷积特征聚合、视觉-关系知识图谱、大规模图像检索

【论文推荐】最新八篇图像检索相关论文—三元组、深度特征图、判别式、卷积特征聚合、视觉-关系知识图谱、大规模图像检索

专知

33+阅读 · 2018年4月23日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

相关论文

Multi-modal reward for visual relationships-based image captioning

Arxiv

0+阅读 · 2023年3月19日

Cross-Dimensional Refined Learning for Real-Time 3D Visual Perception from Monocular Video

Arxiv

0+阅读 · 2023年3月16日

From Show to Tell: A Survey on Image Captioning

Arxiv

15+阅读 · 2021年7月14日

Deep Image Retrieval: A Survey

Arxiv

16+阅读 · 2021年1月27日

Contrastive learning of global and local features for medical image segmentation with limited annotations

Arxiv

19+阅读 · 2020年6月18日

A survey on deep hashing for image retrieval

A survey on deep hashing for image retrieval

Arxiv

15+阅读 · 2020年6月10日

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Arxiv

19+阅读 · 2020年3月31日

Image Segmentation Using Deep Learning: A Survey

Image Segmentation Using Deep Learning: A Survey

Arxiv

47+阅读 · 2020年1月15日

Image Captioning

Arxiv

11+阅读 · 2018年5月13日

End-to-End Dense Video Captioning with Masked Transformer

Arxiv

14+阅读 · 2018年4月3日

相关基金

多智能体系统有限时间一致性的自适应控制研究

国家自然科学基金

3+阅读 · 2015年12月31日

地球化学特征对太湖底泥中多氯联苯微生物厌氧脱氯的影响研究

国家自然科学基金

0+阅读 · 2013年12月31日

lncRNAs和miR-592的相互作用对mESC向神经元分化的影响

国家自然科学基金

0+阅读 · 2012年12月31日

lncRNA BC098786和BC088259调控坐骨神经再生的机制

国家自然科学基金

0+阅读 · 2012年12月31日

LIMK1：罗格列酮抑制人胃癌细胞增殖、迁移及侵袭的作用靶点

国家自然科学基金

0+阅读 · 2012年12月31日

藏药“甲嘎松汤”调节Nrf2/Bach1-ARE-抗氧化酶通路以抗氧化保肝的研究

国家自然科学基金

0+阅读 · 2012年12月31日

抑癌基因PDCD4调控miR-184和miR-374a抑制鼻咽癌生长及促进凋亡

国家自然科学基金

0+阅读 · 2012年12月31日

先天性巨结肠发生新机制：miR-218-1介导肠神经嵴细胞迁移和增殖障碍

国家自然科学基金

0+阅读 · 2012年12月31日

miR-140在肿瘤转移中的作用及机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

神经元凋亡时Egr1对BH3-only蛋白Bim的转录调控

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员