This paper serves as a foundational step towards the development of a linguistically motivated and technically relevant evaluation suite for Greek NLP. We initiate this endeavor by introducing four expert-verified evaluation tasks, specifically targeted at natural language inference, word sense disambiguation (through example comparison or sense selection) and metaphor detection. More than language-adapted replicas of existing tasks, we contribute two innovations which will resonate with the broader resource and evaluation community. Firstly, our inference dataset is the first of its kind, marking not just \textit{one}, but rather \textit{all} possible inference labels, accounting for possible shifts due to e.g. ambiguity or polysemy. Secondly, we demonstrate a cost-efficient method to obtain datasets for under-resourced languages. Using ChatGPT as a language-neutral parser, we transform the Dictionary of Standard Modern Greek into a structured format, from which we derive the other three tasks through simple projections. Alongside each task, we conduct experiments using currently available state of the art machinery. Our experimental baselines affirm the challenging nature of our tasks and highlight the need for expedited progress in order for the Greek NLP ecosystem to keep pace with contemporary mainstream research.
翻译:本文为开发语言驱动且技术相关的希腊语自然语言处理评估套件奠定了初步基础。我们通过引入四项经专家验证的评估任务来启动此项工作,具体聚焦于自然语言推理、词义消歧(通过示例对比或义项选择)及隐喻检测。除对现有任务进行语言适配性复制外,我们贡献了两项创新,这些创新将对更广泛的资源与评估社区产生共鸣。首先,我们的推理数据集是首个标注所有可能推理标签的数据集,而不仅是单一标签,从而能够处理因歧义或多义性等因素导致的标签偏移。其次,我们展示了一种针对低资源语言获取数据集的成本高效方法。通过使用ChatGPT作为语言无关解析器,我们将《标准现代希腊语词典》转换为结构化格式,并从中通过简单投影衍生出其他三项任务。针对每项任务,我们采用当前最先进的机器进行实验。实验基线结果证实了任务具有挑战性,并强调为促使希腊语自然语言处理生态系统跟上当代主流研究步伐,亟需加快推进进度。