Detecting and extracting textual information from natural scene images needs Scene Text Detection (STD) algorithms. Fully Convolutional Neural Networks (FCNs) are usually utilized as the backbone model to extract features in these instance segmentation based STD algorithms. FCNs naturally come with high computational complexity. Furthermore, to keep up with the growing variety of models, flexible architectures are needed. In order to accelerate various STD algorithms efficiently, a versatility-performance balanced hardware architecture is proposed, together with a simple but efficient way of configuration. This architecture is able to compute different FCN models without hardware redesign. The optimization is focused on hardware with finely designed computing modules, while the versatility of different network reconfigurations is achieved by microcodes instead of a strenuously designed compiler. Multiple parallel techniques at different levels and several complexity-reduction methods are explored to speed up the FCN computation. Results from implementation show that, given the same tasks, the proposed system achieves a better throughput compared with the studied GPU. Particularly, our system reduces the comprehensive Operation Expense (OpEx) at GPU by 46\%, while the power efficiency is enhanced by 32\%. This work has been deployed in commercial applications and provided stable consumer text detection services.
翻译:从自然场景图像中检测和提取文本信息需要场景文本检测(STD)算法。在全卷积神经网络(FCN)通常被用作这些基于实例分割的STD算法中提取特征的主干模型。FCN天然具有高计算复杂度。此外,为了跟上日益多样化的模型,需要灵活的架构。为了高效加速各类STD算法,本文提出了一种兼顾通用性与性能的硬件架构,并附带一种简单有效的配置方式。该架构无需重新设计硬件即可计算不同的FCN模型。优化重点在于具有精细设计计算模块的硬件,而不同网络重配置的通用性则通过微代码而非繁琐设计的编译器实现。本文探索了不同层面的多种并行技术及若干降复杂度方法以加速FCN计算。实现结果表明,在相同任务下,所提系统相比所研究的GPU实现了更优吞吐量。具体而言,本系统将GPU综合运营成本(OpEx)降低了46%,同时能效提升了32%。该工作已部署于商业应用,并提供了稳定的消费级文本检测服务。