Dialectal Arabic to Modern Standard Arabic (DA-MSA) translation is a challenging task in Machine Translation (MT) due to significant lexical, syntactic, and semantic divergences between Arabic dialects and MSA. Existing automatic evaluation metrics and general-purpose human evaluation frameworks struggle to capture dialect-specific MT errors, hindering progress in translation assessment. This paper introduces Ara-HOPE, a human-centric post-editing evaluation framework designed to systematically address these challenges. The framework includes a five-category error taxonomy and a decision-tree annotation protocol. Through comparative evaluation of three MT systems (Arabic-centric Jais, general-purpose GPT-3.5, and baseline NLLB-200), Ara-HOPE effectively highlights systematic performance differences between these systems. Our results show that dialect-specific terminology and semantic preservation remain the most persistent challenges in DA-MSA translation. Ara-HOPE establishes a new framework for evaluating Dialectal Arabic MT quality and provides actionable guidance for improving dialect-aware MT systems. For reproducibility, we make the annotation files and related materials publicly available at https://github.com/abdullahalabdullah/Ara-HOPE
翻译:方言阿拉伯语到现代标准阿拉伯语(DA-MSA)的翻译是机器翻译(MT)领域的一项挑战性任务,这源于阿拉伯语方言与现代标准阿拉伯语之间显著的词汇、句法和语义差异。现有的自动评估指标和通用人工评估框架难以捕捉方言特有的机器翻译错误,阻碍了翻译评估的进展。本文提出Ara-HOPE,一个旨在系统性应对这些挑战的人本译后编辑评估框架。该框架包含一个五类错误分类体系和一个决策树标注协议。通过对三个机器翻译系统(阿拉伯语中心化的Jais、通用型GPT-3.5以及基线系统NLLB-200)的比较评估,Ara-HOPE有效揭示了这些系统间的系统性性能差异。我们的结果表明,方言特有术语的处理和语义保持仍是DA-MSA翻译中最具持续性的挑战。Ara-HOPE为评估方言阿拉伯语机器翻译质量建立了一个新框架,并为改进具备方言感知能力的机器翻译系统提供了可操作的指导。为确保可复现性,我们将标注文件及相关材料公开于https://github.com/abdullahalabdullah/Ara-HOPE。