Handling gender across languages remains a persistent challenge for Machine Translation (MT) and Large Language Models (LLMs), especially when translating from gender-neutral languages into morphologically gendered ones, such as English to Italian. English largely omits grammatical gender, while Italian requires explicit agreement across multiple grammatical categories. This asymmetry often leads MT systems to default to masculine forms, reinforcing bias and reducing translation accuracy. To address this issue, we present the Contextual Gender Annotation (ConGA) framework, a linguistically grounded set of guidelines for word-level gender annotation. The scheme distinguishes between semantic gender in English through three tags, Masculine (M), Feminine (F), and Ambiguous (A), and grammatical gender realisation in Italian (Masculine (M), Feminine (F)), combined with entity-level identifiers for cross-sentence tracking. We apply ConGA to the gENder-IT dataset, creating a gold-standard resource for evaluating gender bias in translation. Our results reveal systematic masculine overuse and inconsistent feminine realisation, highlighting persistent limitations of current MT systems. By combining fine-grained linguistic annotation with quantitative evaluation, this work offers both a methodology and a benchmark for building more gender-aware and multilingual NLP systems.
翻译:跨语言性别处理仍是机器翻译(MT)与大语言模型(LLM)持续面临的挑战,尤其在将中性语言译为形态性别语言(如英语译意大利语)时。英语基本省略语法性别,而意大利语要求在多个语法范畴中显性一致。这种不对称性常导致MT系统默认使用阳性形式,加剧偏见并降低翻译准确性。针对此问题,我们提出情境化性别标注(ConGA)框架——一套基于语言学的词级性别标注指南。该方案通过三种标签(阳性(M)、阴性(F)、歧义(A))区分英语语义性别,结合意大利语语法性别实现(阳性(M)、阴性(F)),并引入实体级标识符实现跨句追踪。我们将ConGA应用于gENder-IT数据集,创建评估翻译性别偏见的黄金标准资源。实验结果显示系统性的阳性泛化与不一致的阴性实现,揭示当前MT系统的持续局限。通过融合细粒度语言学标注与量化评估,本研究为构建更具性别意识的多语言NLP系统提供了方法论与基准测试资源。