We present publicly available COPAL-ID, a novel Indonesian language common sense reasoning dataset. Unlike the previous Indonesian COPA dataset (XCOPA-ID), COPAL-ID incorporates Indonesian local and cultural nuances, and therefore, provides a more natural portrayal of day-to-day causal reasoning within the Indonesian cultural sphere. Professionally written by natives from scratch, COPAL-ID is more fluent and free from awkward phrases, unlike the translated XCOPA-ID. In addition, we present COPAL-ID in both standard Indonesian and in Jakartan Indonesian--a dialect commonly used in daily conversation. COPAL-ID poses a greater challenge for existing open-sourced and closed state-of-the-art multilingual language models, yet is trivially easy for humans. Our findings suggest that even the current best open-source, multilingual model struggles to perform well, achieving 65.47% accuracy on COPAL-ID, significantly lower than on the culturally-devoid XCOPA-ID (79.40%). Despite GPT-4's impressive score, it suffers the same performance degradation compared to its XCOPA-ID score, and it still falls short of human performance. This shows that these language models are still way behind in comprehending the local nuances of Indonesian.
翻译:我们发布了公开可用的COPAL-ID,这是一个新颖的印尼语常识推理数据集。与先前的印尼COPA数据集(XCOPA-ID)不同,COPAL-ID融入了印尼本土文化特征与细微差异,因此能更自然地呈现印尼文化语境下的日常因果推理。该数据集由母语者从零开始专业编写,语言更流畅,且避免了XCOPA-ID中因翻译产生的生硬表达。此外,我们同时提供标准印尼语版本与雅加达印尼语版本——后者是日常对话中常用的方言。COPAL-ID为现有开源及闭源的多语言大语言模型带来了更大挑战,但对人类而言却极为简单。研究结果表明,当前最佳的开源多语言模型在该数据集上仅达到65.47%的准确率,显著低于其在缺乏文化特征的XCOPA-ID上的表现(79.40%)。尽管GPT-4取得了令人瞩目的成绩,但其相较于XCOPA-ID得分的性能下降同样明显,且仍无法达到人类水平。这表明这些语言模型在理解印尼语本土细微差异方面仍有巨大差距。