Large language models (LLMs) can produce text that closely resembles human writing. This capability raises concerns about misuse, including disinformation and content manipulation. Detecting AI-generated text is essential to maintain authenticity and prevent malicious applications. Existing research has addressed detection in multiple languages, but the Bengali language remains largely unexplored. Bengali's rich vocabulary and complex structure make distinguishing human-written and AI-generated text particularly challenging. This study investigates five transformer-based models: XLMRoBERTa-Large, mDeBERTaV3-Base, BanglaBERT-Base, IndicBERT-Base and MultilingualBERT-Base. Zero-shot evaluation shows that all models perform near chance levels (around 50% accuracy) and highlight the need for task-specific fine-tuning. Fine-tuning significantly improves performance, with XLM-RoBERTa, mDeBERTa and MultilingualBERT achieving around 91% on both accuracy and F1-score. IndicBERT demonstrates comparatively weaker performance, indicating limited effectiveness in fine-tuning for this task. This work advances AI-generated text detection in Bengali and establishes a foundation for building robust systems to counter AI-generated content.
翻译:大型语言模型(LLM)能够生成与人类写作高度相似的文本。这种能力引发了关于滥用的担忧,包括虚假信息和内容操纵。检测AI生成的文本对于保持真实性并防止恶意应用至关重要。现有研究已涉及多种语言的检测,但孟加拉语在很大程度上仍未得到探索。孟加拉语丰富的词汇和复杂的结构使得区分人类书写和AI生成的文本尤为困难。本研究调查了五种基于Transformer的模型:XLMRoBERTa-Large、mDeBERTaV3-Base、BanglaBERT-Base、IndicBERT-Base和MultilingualBERT-Base。零样本评估显示,所有模型的性能都接近随机水平(准确率约50%),并凸显了任务特定微调的必要性。微调显著提高了性能,其中XLM-RoBERTa、mDeBERTa和MultilingualBERT在准确率和F1分数上均达到约91%。IndicBERT表现出相对较弱的性能,表明其在该任务的微调中效果有限。这项工作推动了孟加拉语中AI生成文本的检测研究,并为构建鲁棒系统以应对AI生成内容奠定了基础。