A text on an image often stores important information and directly carries high level semantics, makes it as important source of information and become a very active research topic. Many studies have shown that the use of CNN-based neural networks is quite effective and accurate for image classification which is the basis of text recognition. It can also be more enhanced by using transfer learning from pre-trained model trained on ImageNet dataset as an initial weight. In this research, the recognition is trained by using Chars74K dataset and the best model results then tested on some samples of IIIT-5K-Dataset. The research results showed that the best accuracy is the model that trained using VGG-16 architecture applied with image transformation of rotation 15{\deg}, image scale of 0.9, and the application of gaussian blur effect. The research model has an accuracy of 97.94% for validation data, 98.16% for test data, and 95.62% for the test data from IIIT-5K-Dataset. Based on these results, it can be concluded that pre-trained CNN can produce good accuracy for text recognition, and the model architecture that used in this study can be used as reference material in the development of text detection systems in the future
翻译:图像中的文字通常承载重要信息并直接蕴含高层语义,使其成为重要信息来源并成为非常活跃的研究课题。大量研究表明,基于CNN的神经网络对图像分类(文本识别的基础)具有高效且准确的性能。通过采用在ImageNet数据集上预训练的模型权重进行迁移学习,可进一步提升模型效果。本研究使用Chars74K数据集训练识别模型,并将最优模型结果在IIIT-5K数据集的部分样本上进行测试。研究结果表明,最高准确率模型采用VGG-16架构,并应用了15度旋转、0.9倍图像缩放及高斯模糊效果处理。该模型的验证数据准确率达97.94%,测试数据准确率达98.16%,在IIIT-5K数据集测试数据上准确率达95.62%。基于这些结果可得出结论:预训练CNN能够实现良好的文本识别精度,本研究所用模型架构可为未来文本检测系统的开发提供参考依据。