WordVIS: A Color Worth A Thousand Words

Document classification is considered a critical element in automated document processing systems. In recent years multi-modal approaches have become increasingly popular for document classification. Despite their improvements, these approaches are underutilized in the industry due to their requirement for a tremendous volume of training data and extensive computational power. In this paper, we attempt to address these issues by embedding textual features directly into the visual space, allowing lightweight image-based classifiers to achieve state-of-the-art results using small-scale datasets in document classification. To evaluate the efficacy of the visual features generated from our approach on limited data, we tested on the standard dataset Tobacco-3482. Our experiments show a tremendous improvement in image-based classifiers, achieving an improvement of 4.64% using ResNet50 with no document pre-training. It also sets a new record for the best accuracy of the Tobacco-3482 dataset with a score of 91.14% using the image-based DocXClassifier with no document pre-training. The simplicity of the approach, its resource requirements, and subsequent results provide a good prospect for its use in industrial use cases.

翻译：文档分类被视为自动化文档处理系统中的关键要素。近年来，多模态方法在文档分类领域日益流行。尽管这些方法有所改进，但由于其对海量训练数据和强大计算能力的需求，在工业界仍未得到充分利用。本文试图通过将文本特征直接嵌入视觉空间来解决这些问题，使得轻量级的基于图像的分类器能够在文档分类任务中，利用小规模数据集实现最先进的性能。为评估本方法在有限数据下生成的视觉特征的有效性，我们在标准数据集Tobacco-3482上进行了测试。实验结果表明，基于图像的分类器性能获得显著提升：未进行文档预训练的ResNet50实现了4.64%的性能提升；同时，未进行文档预训练的基于图像的DocXClassifier以91.14%的准确率创造了Tobacco-3482数据集的最佳精度新纪录。该方法具有实现简单、资源需求低的特点，其后续结果为工业应用场景提供了良好前景。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日