Reading comprehension systems for low-resource languages face significant challenges in handling unanswerable questions. These systems tend to produce unreliable responses when correct answers are absent from context. To solve this problem, we introduce NCTB-QA, a large-scale Bangla question answering dataset comprising 87,805 question-answer pairs extracted from 50 textbooks published by Bangladesh's National Curriculum and Textbook Board. Unlike existing Bangla datasets, NCTB-QA maintains a balanced distribution of answerable (57.25%) and unanswerable (42.75%) questions. NCTB-QA also includes adversarially designed instances containing plausible distractors. We benchmark three transformer-based models (BERT, RoBERTa, ELECTRA) and demonstrate substantial improvements through fine-tuning. BERT achieves 313% relative improvement in F1 score (0.150 to 0.620). Semantic answer quality measured by BERTScore also increases significantly across all models. Our results establish NCTB-QA as a challenging benchmark for Bangla educational question answering. This study demonstrates that domain-specific fine-tuning is critical for robust performance in low-resource settings.
翻译:针对低资源语言的阅读理解系统在处理不可回答问题方面面临重大挑战。当上下文中缺乏正确答案时,这些系统倾向于产生不可靠的响应。为解决此问题,我们引入了NCTB-QA,这是一个大规模孟加拉语问答数据集,包含从孟加拉国国家课程与教科书委员会出版的50本教科书中提取的87,805个问答对。与现有的孟加拉语数据集不同,NCTB-QA保持了可回答问题(57.25%)与不可回答问题(42.75%)的平衡分布。NCTB-QA还包含对抗性设计的实例,其中含有看似合理的干扰项。我们对三种基于Transformer的模型(BERT、RoBERTa、ELECTRA)进行了基准测试,并通过微调展示了显著的性能提升。BERT的F1分数实现了313%的相对提升(从0.150到0.620)。通过BERTScore衡量的语义答案质量在所有模型中也均有显著提高。我们的研究结果确立了NCTB-QA作为孟加拉语教育问答领域一个具有挑战性的基准。本研究表明,在低资源环境下,领域特定的微调对于实现鲁棒性能至关重要。