This paper serves as a foundational step towards the development of a linguistically motivated and technically relevant evaluation suite for Greek NLP. We initiate this endeavor by introducing four expert-verified evaluation tasks, specifically targeted at natural language inference, word sense disambiguation (through example comparison or sense selection) and metaphor detection. More than language-adapted replicas of existing tasks, we contribute two innovations which will resonate with the broader resource and evaluation community. Firstly, our inference dataset is the first of its kind, marking not just \textit{one}, but rather \textit{all} possible inference labels, accounting for possible shifts due to e.g. ambiguity or polysemy. Secondly, we demonstrate a cost-efficient method to obtain datasets for under-resourced languages. Using ChatGPT as a language-neutral parser, we transform the Dictionary of Standard Modern Greek into a structured format, from which we derive the other three tasks through simple projections. Alongside each task, we conduct experiments using currently available state of the art machinery. Our experimental baselines affirm the challenging nature of our tasks and highlight the need for expedited progress in order for the Greek NLP ecosystem to keep pace with contemporary mainstream research.
翻译:本文旨在为构建具有语言学动机且技术相关的希腊语自然语言处理评估套件奠定基础。我们通过引入四项经专家验证的评估任务开启此项工作,具体聚焦于自然语言推理、词义消歧(通过示例比较或义项选择)及隐喻检测。除了对现有任务进行语言适配复现外,我们提出两项创新,这将引发更广泛的资源与评估领域共鸣。其一,我们的推理数据集属同类首创,不仅标注单一推理标签,而是标注所有可能的推理标签,从而考量因歧义或多义性导致的语义偏移。其二,我们展示了一种为低资源语言获取数据集的低成本方法:利用ChatGPT作为语言中性解析器,将《标准现代希腊语词典》转化为结构化格式,并通过简单投影从该格式衍生其他三项任务。针对每项任务,我们使用当前最先进的模型进行实验。实验基线验证了任务设计的挑战性,并凸显出希腊语自然语言处理生态为追赶当代主流研究进展而亟需加速的必要性。