BERT-based neural architectures have established themselves as popular state-of-the-art baselines for many downstream NLP tasks. However, these architectures are data-hungry and consume a lot of memory and energy, often hindering their deployment in many real-time, resource-constrained applications. Existing lighter versions of BERT (eg. DistilBERT and TinyBERT) often cannot perform well on complex NLP tasks. More importantly, from a designer's perspective, it is unclear what is the "right" BERT-based architecture to use for a given NLP task that can strike the optimal trade-off between the resources available and the minimum accuracy desired by the end user. System engineers have to spend a lot of time conducting trial-and-error experiments to find a suitable answer to this question. This paper presents an exploratory study of BERT-based models under different resource constraints and accuracy budgets to derive empirical observations about this resource/accuracy trade-offs. Our findings can help designers to make informed choices among alternative BERT-based architectures for embedded systems, thus saving significant development time and effort.
翻译:基于BERT的神经架构已成为诸多下游自然语言处理任务中主流的先进基线模型。然而,这些架构对数据需求量大且能耗与内存占用高,常阻碍其在实时性要求高、资源受限的应用场景中部署。现有轻量化BERT变体(如DistilBERT与TinyBERT)在复杂自然语言处理任务中常表现不佳。更重要的是,从设计者视角来看,针对特定自然语言处理任务,如何选择"恰当"的BERT架构以在可用资源与用户期望的最低精度之间达成最优权衡,目前尚未明确。系统工程师需耗费大量时间通过试错实验来寻找这一问题的答案。本文对不同资源约束与精度预算场景下基于BERT的模型进行探索性研究,归纳出关于资源/精度权衡的实证结论。研究结果可帮助设计者在嵌入式系统的备选BERT架构中做出明智选择,从而显著节省开发时间与精力。