This paper presents a detailed system description of our entry for the CHiPSAL 2025 shared task, focusing on language detection, hate speech identification, and target detection in Devanagari script languages. We experimented with a combination of large language models and their ensembles, including MuRIL, IndicBERT, and Gemma-2, and leveraged unique techniques like focal loss to address challenges in the natural understanding of Devanagari languages, such as multilingual processing and class imbalance. Our approach achieved competitive results across all tasks: F1 of 0.9980, 0.7652, and 0.6804 for Sub-tasks A, B, and C respectively. This work provides insights into the effectiveness of transformer models in tasks with domain-specific and linguistic challenges, as well as areas for potential improvement in future iterations.
翻译:本文详细描述了我们在CHiPSAL 2025共享任务中的参赛系统,该任务聚焦于天城体文字语言的语言检测、仇恨言论识别与目标检测。我们实验了多种大语言模型及其集成方法,包括MuRIL、IndicBERT和Gemma-2,并采用了如焦点损失等独特技术,以应对天城体语言自然理解中的挑战,例如多语言处理和类别不平衡问题。我们的方法在所有子任务中均取得了具有竞争力的结果:子任务A、B和C的F1分数分别达到0.9980、0.7652和0.6804。这项工作揭示了Transformer模型在面临特定领域和语言学挑战任务中的有效性,并为未来迭代指明了潜在的改进方向。