CLIcK: A Benchmark Dataset of Cultural and Linguistic Intelligence in Korean

Despite the rapid development of large language models (LLMs) for the Korean language, there remains an obvious lack of benchmark datasets that test the requisite Korean cultural and linguistic knowledge. Because many existing Korean benchmark datasets are derived from the English counterparts through translation, they often overlook the different cultural contexts. For the few benchmark datasets that are sourced from Korean data capturing cultural knowledge, only narrow tasks such as bias and hate speech detection are offered. To address this gap, we introduce a benchmark of Cultural and Linguistic Intelligence in Korean (CLIcK), a dataset comprising 1,995 QA pairs. CLIcK sources its data from official Korean exams and textbooks, partitioning the questions into eleven categories under the two main categories of language and culture. For each instance in CLIcK, we provide fine-grained annotation of which cultural and linguistic knowledge is required to answer the question correctly. Using CLIcK, we test 13 language models to assess their performance. Our evaluation uncovers insights into their performances across the categories, as well as the diverse factors affecting their comprehension. CLIcK offers the first large-scale comprehensive Korean-centric analysis of LLMs' proficiency in Korean culture and language.

翻译：尽管针对韩语的大型语言模型（LLMs）发展迅速，但测试所需韩语文化与语言知识的基准数据集仍明显匮乏。由于许多现有的韩语基准数据集是通过翻译英语对应数据集而来，它们往往忽略了不同的文化背景。少数基于韩语数据构建的体现文化知识的基准数据集仅提供偏见与仇恨言论检测等狭窄任务。为弥补这一缺口，我们推出了韩语文化与语言智能基准数据集（CLIcK），该数据集包含1,995个问答对。CLIcK的数据来源于韩语官方考试和教科书，将问题划分为语言和文化两大主类别下的11个子类别。对于CLIcK中的每个实例，我们提供了细粒度标注，说明正确回答问题所需的文化与语言知识。我们利用CLIcK测试了13个语言模型的性能。评估结果揭示了这些模型在不同类别上的表现，以及影响其理解的多种因素。CLIcK首次提供了对LLMs在韩语文化与语言能力上的大规模、全面且以韩语为中心的分析。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日