DocEnTr: An End-to-End Document Image Enhancement Transformer - 专知论文

会员服务 ·

0

端到端 · 变换 · INFORMS · MoDELS · 去噪 ·

2022 年 1 月 25 日

DocEnTr: An End-to-End Document Image Enhancement Transformer

翻译：DocEnTr: 文档端到端的文档图像增强变换器

Mohamed Ali Souibgui,Sanket Biswas,Sana Khamekhem Jemni,Yousri Kessentini,Alicia Fornés,Josep Lladós,Umapada Pal

from arxiv, submitted to ICPR 2022

Document images can be affected by many degradation scenarios, which cause recognition and processing difficulties. In this age of digitization, it is important to denoise them for proper usage. To address this challenge, we present a new encoder-decoder architecture based on vision transformers to enhance both machine-printed and handwritten document images, in an end-to-end fashion. The encoder operates directly on the pixel patches with their positional information without the use of any convolutional layers, while the decoder reconstructs a clean image from the encoded patches. Conducted experiments show a superiority of the proposed model compared to the state-of the-art methods on several DIBCO benchmarks. Code and models will be publicly available at: \url{https://github.com/dali92002/DocEnTR}.

翻译：文档图像可能会受到许多降解情景的影响,这会造成识别和处理困难。在这个数字化时代,必须将其密封起来,以便加以适当使用。为了应对这一挑战,我们以视觉变压器为基础,推出一个新的编码器解码器结构,以端到端的方式加强机器打印和手写文件图像。编码器直接在像素补丁上操作,不使用任何卷发层,而解码器则从编码补丁中重建干净的图像。进行实验显示,提议的模型优于DIBCO若干基准的最新方法。代码和模型将在以下网址公开提供:\url{https://github.comdali92002/DocEnTR}。

0

相关内容

端到端

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

326+阅读 · 2020年11月26日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

47+阅读 · 2020年10月31日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

61+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

已删除

将门创投

13+阅读 · 2019年4月17日

超强合集：OCR文本检测干货汇总（含论文、源码、demo等资源）

超强合集：OCR文本检测干货汇总（含论文、源码、demo等资源）

极市平台

33+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

心脏的多形态耦合与层级级联计算可视化方法的研究

国家自然科学基金

1+阅读 · 2015年12月31日

面向中文文本的事件时空语义解析方法研究

国家自然科学基金

3+阅读 · 2013年12月31日

嵌入式软件可组合信息流安全验证机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Erdos-Sos猜想及几个相关的极值组合问题

国家自然科学基金

0+阅读 · 2012年12月31日

智能手机安全漏洞挖掘技术研究

国家自然科学基金

1+阅读 · 2012年12月31日

新型荧光微结构光纤二氧化碳传感器

国家自然科学基金

0+阅读 · 2012年12月31日

基于Top-hat变换的多尺度多梯度多结构元素图像处理技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

气候变化下流域水文极端事件的响应机制及预测－以东江为例

国家自然科学基金

0+阅读 · 2009年12月31日

面向GIS的文本空间关系解析机制研究

国家自然科学基金

1+阅读 · 2009年12月31日

基于多层次语言粒度的文本情感分类研究

国家自然科学基金

1+阅读 · 2008年12月31日

LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking

LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking

Arxiv

0+阅读 · 2022年4月19日

Multimodal Token Fusion for Vision Transformers

Arxiv

3+阅读 · 2022年4月19日

UniGDD: A Unified Generative Framework for Goal-Oriented Document-Grounded Dialogue

Arxiv

0+阅读 · 2022年4月16日

Poolingformer: Long Document Modeling with Pooling Attention

Arxiv

14+阅读 · 2021年5月10日

Transformer Tracking

Arxiv

17+阅读 · 2021年3月29日

Phase-aware Speech Enhancement with Deep Complex U-Net

Phase-aware Speech Enhancement with Deep Complex U-Net

Arxiv

15+阅读 · 2019年3月7日

Detect-to-Retrieve: Efficient Regional Aggregation for Image Search

Arxiv

15+阅读 · 2018年12月4日

Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking

Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking

Arxiv

10+阅读 · 2018年3月29日

Distance-based Self-Attention Network for Natural Language Inference

Arxiv

10+阅读 · 2017年12月6日

Attention Is All You Need

Arxiv

29+阅读 · 2017年12月6日

VIP会员

文章信息

相关主题

最新内容

《远程自主系统可扩展态势感知的解决方案》32页2026最新报告

《远程自主系统可扩展态势感知的解决方案》32页2026最新报告

专知会员服务

4+阅读 · 7月23日

《基于强化学习的自动化红队测试》

《基于强化学习的自动化红队测试》

专知会员服务

3+阅读 · 7月23日

《下一代无人机-卫星通信：人工智能创新与未来展望》32页长综述

《下一代无人机-卫星通信：人工智能创新与未来展望》32页长综述

专知会员服务

6+阅读 · 7月23日

“天降毒雾”：无人机如何使化学战重返乌克兰战场

“天降毒雾”：无人机如何使化学战重返乌克兰战场

专知会员服务

2+阅读 · 7月23日

伊朗不对称防空战略的演进

伊朗不对称防空战略的演进

专知会员服务

4+阅读 · 7月23日

对抗环境下超视距目标打击的情报支援

对抗环境下超视距目标打击的情报支援

专知会员服务

10+阅读 · 7月22日

《面向复杂地形下无人机跟踪地面机器人（UAV–UGV）的自适应多滤波器扩展卡尔曼滤波框架》

《面向复杂地形下无人机跟踪地面机器人（UAV–UGV）的自适应多滤波器扩展卡尔曼滤波框架》

专知会员服务

4+阅读 · 7月22日

纵深侦察：大规模作战行动中远程侦察与监视之迫切需求

纵深侦察：大规模作战行动中远程侦察与监视之迫切需求

专知会员服务

8+阅读 · 7月22日

共享认知，分布式研判：复杂行动中的美国空军指挥控制（万字长文）

共享认知，分布式研判：复杂行动中的美国空军指挥控制（万字长文）

专知会员服务

10+阅读 · 7月22日

《无人机对海面作战影响评估》

《无人机对海面作战影响评估》

专知会员服务

15+阅读 · 7月21日

《可损耗无人系统规模化应用对美国军事转型的战略影响（2022-2030）》2026年270页

《可损耗无人系统规模化应用对美国军事转型的战略影响（2022-2030）》2026年270页

专知会员服务

15+阅读 · 7月21日

博士论文 | 后训练如何损害大模型生成多样性？SimpleStrat与Stylus

博士论文 | 后训练如何损害大模型生成多样性？SimpleStrat与Stylus

专知会员服务

4+阅读 · 7月21日

综述 | 面向5G/6G网络的LLM智能体AI：架构、协议与标准化

综述 | 面向5G/6G网络的LLM智能体AI：架构、协议与标准化

专知会员服务

6+阅读 · 7月21日

五角大楼新设无人机办公室（DRPM-UxS）将如何重塑美国无人系统格局（附美国防部设立备忘录）

五角大楼新设无人机办公室（DRPM-UxS）将如何重塑美国无人系统格局（附美国防部设立备忘录）

专知会员服务

9+阅读 · 7月21日

印度精确打击与指挥架构的断层

印度精确打击与指挥架构的断层

专知会员服务

7+阅读 · 7月20日

相关VIP内容

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

326+阅读 · 2020年11月26日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

47+阅读 · 2020年10月31日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

61+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《基于强化学习的自动化红队测试》

“天降毒雾”：无人机如何使化学战重返乌克兰战场

《远程自主系统可扩展态势感知的解决方案》32页2026最新报告

《下一代无人机-卫星通信：人工智能创新与未来展望》32页长综述

相关资讯

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

已删除

将门创投

13+阅读 · 2019年4月17日

超强合集：OCR文本检测干货汇总（含论文、源码、demo等资源）

超强合集：OCR文本检测干货汇总（含论文、源码、demo等资源）

极市平台

33+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

相关论文

LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking

LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking

Arxiv

0+阅读 · 2022年4月19日

Multimodal Token Fusion for Vision Transformers

Arxiv

3+阅读 · 2022年4月19日

UniGDD: A Unified Generative Framework for Goal-Oriented Document-Grounded Dialogue

Arxiv

0+阅读 · 2022年4月16日

Poolingformer: Long Document Modeling with Pooling Attention

Arxiv

14+阅读 · 2021年5月10日

Transformer Tracking

Arxiv

17+阅读 · 2021年3月29日

Phase-aware Speech Enhancement with Deep Complex U-Net

Phase-aware Speech Enhancement with Deep Complex U-Net

Arxiv

15+阅读 · 2019年3月7日

Detect-to-Retrieve: Efficient Regional Aggregation for Image Search

Arxiv

15+阅读 · 2018年12月4日

Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking

Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking

Arxiv

10+阅读 · 2018年3月29日

Distance-based Self-Attention Network for Natural Language Inference

Arxiv

10+阅读 · 2017年12月6日

Attention Is All You Need

Arxiv

29+阅读 · 2017年12月6日

相关基金

心脏的多形态耦合与层级级联计算可视化方法的研究

国家自然科学基金

1+阅读 · 2015年12月31日

面向中文文本的事件时空语义解析方法研究

国家自然科学基金

3+阅读 · 2013年12月31日

嵌入式软件可组合信息流安全验证机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Erdos-Sos猜想及几个相关的极值组合问题

国家自然科学基金

0+阅读 · 2012年12月31日

智能手机安全漏洞挖掘技术研究

国家自然科学基金

1+阅读 · 2012年12月31日

新型荧光微结构光纤二氧化碳传感器

国家自然科学基金

0+阅读 · 2012年12月31日

基于Top-hat变换的多尺度多梯度多结构元素图像处理技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

气候变化下流域水文极端事件的响应机制及预测－以东江为例

国家自然科学基金

0+阅读 · 2009年12月31日

面向GIS的文本空间关系解析机制研究

国家自然科学基金

1+阅读 · 2009年12月31日

基于多层次语言粒度的文本情感分类研究

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员