Arabic calligraphy represents one of the richest visual traditions of the Arabic language, blending linguistic meaning with artistic form. Although multimodal models have advanced across languages, their ability to process Arabic script, especially in artistic and stylized calligraphic forms, remains largely unexplored. To address this gap, we present DuwatBench, a benchmark of 1,272 curated samples containing about 1,475 unique words across six classical and modern calligraphic styles, each paired with sentence-level detection annotations. The dataset reflects real-world challenges in Arabic writing, such as complex stroke patterns, dense ligatures, and stylistic variations that often challenge standard text recognition systems. Using DuwatBench, we evaluated 13 leading Arabic and multilingual multimodal models and showed that while they perform well on clean text, they struggle with calligraphic variation, artistic distortions, and precise visual-text alignment. By publicly releasing DuwatBench and its annotations, we aim to advance culturally grounded multimodal research, foster fair inclusion of the Arabic language and visual heritage in AI systems, and support continued progress in this area. Our dataset (https://huggingface.co/datasets/MBZUAI/DuwatBench) and evaluation suit (https://github.com/mbzuai-oryx/DuwatBench) are publicly available.
翻译:阿拉伯书法代表了阿拉伯语言最丰富的视觉传统之一,融合了语言意义与艺术形式。尽管多模态模型已在多种语言中取得进展,但其处理阿拉伯文字(尤其是艺术化和风格化的书法形式)的能力在很大程度上仍未得到探索。为填补这一空白,我们提出了DuwatBench,这是一个包含1,272个精选样本的基准数据集,涵盖六种古典与现代书法风格,包含约1,475个独特单词,每个样本均配有句子级检测标注。该数据集反映了阿拉伯书写中的实际挑战,例如复杂的笔画模式、密集的连字和风格变化,这些因素常常对标准文本识别系统构成挑战。利用DuwatBench,我们评估了13个领先的阿拉伯语及多语言多模态模型,结果表明,尽管这些模型在清晰文本上表现良好,但在处理书法变体、艺术化扭曲和精确的视觉-文本对齐方面仍存在困难。通过公开发布DuwatBench及其标注,我们旨在推动基于文化的多模态研究,促进阿拉伯语言和视觉遗产在人工智能系统中的公平包容,并支持该领域的持续进展。我们的数据集(https://huggingface.co/datasets/MBZUAI/DuwatBench)和评估套件(https://github.com/mbzuai-oryx/DuwatBench)已公开可用。