In the rapidly evolving landscape of enterprise natural language processing (NLP), the demand for efficient, lightweight models capable of handling multi-domain text automation tasks has intensified. This study conducts a comparative analysis of three prominent lightweight Transformer models - DistilBERT, MiniLM, and ALBERT - across three distinct domains: customer sentiment classification, news topic classification, and toxicity and hate speech detection. Utilizing datasets from IMDB, AG News, and the Measuring Hate Speech corpus, we evaluated performance using accuracy-based metrics including accuracy, precision, recall, and F1-score, as well as efficiency metrics such as model size, inference time, throughput, and memory usage. Key findings reveal that no single model dominates all performance dimensions. ALBERT achieves the highest task-specific accuracy in multiple domains, MiniLM excels in inference speed and throughput, and DistilBERT demonstrates the most consistent accuracy across tasks while maintaining competitive efficiency. All results reflect controlled fine-tuning under fixed enterprise-oriented constraints rather than exhaustive hyperparameter optimization. These results highlight trade-offs between accuracy and efficiency, recommending MiniLM for latency-sensitive enterprise applications, DistilBERT for balanced performance, and ALBERT for resource-constrained environments.
翻译:在企业自然语言处理(NLP)快速发展的背景下,对能够处理多领域文本自动化任务的高效、轻量级模型的需求日益增长。本研究对三种主流的轻量级Transformer模型——DistilBERT、MiniLM和ALBERT——在三个不同领域进行了比较分析:客户情感分类、新闻主题分类以及毒性与仇恨言论检测。利用来自IMDB、AG News和Measuring Hate Speech语料库的数据集,我们使用基于准确率的指标(包括准确率、精确率、召回率和F1分数)以及效率指标(如模型大小、推理时间、吞吐量和内存使用量)评估了性能。关键发现表明,没有单一模型在所有性能维度上占据主导地位。ALBERT在多个领域实现了最高的任务特定准确率,MiniLM在推理速度和吞吐量方面表现出色,而DistilBERT在保持竞争力的效率的同时,展示了跨任务最一致的准确率。所有结果均反映了在固定的企业导向约束下进行的受控微调,而非详尽的超参数优化。这些结果突显了准确率与效率之间的权衡,建议将MiniLM用于对延迟敏感的企业应用,DistilBERT用于平衡性能,而ALBERT适用于资源受限的环境。