CLIcK: A Benchmark Dataset of Cultural and Linguistic Intelligence in Korean

Despite the rapid development of large language models (LLMs) for the Korean language, there remains an obvious lack of benchmark datasets that test the requisite Korean cultural and linguistic knowledge. Because many existing Korean benchmark datasets are derived from the English counterparts through translation, they often overlook the different cultural contexts. For the few benchmark datasets that are sourced from Korean data capturing cultural knowledge, only narrow tasks such as bias and hate speech detection are offered. To address this gap, we introduce a benchmark of Cultural and Linguistic Intelligence in Korean (CLIcK), a dataset comprising 1,995 QA pairs. CLIcK sources its data from official Korean exams and textbooks, partitioning the questions into eleven categories under the two main categories of language and culture. For each instance in CLIcK, we provide fine-grained annotation of which cultural and linguistic knowledge is required to answer the question correctly. Using CLIcK, we test 13 language models to assess their performance. Our evaluation uncovers insights into their performances across the categories, as well as the diverse factors affecting their comprehension. CLIcK offers the first large-scale comprehensive Korean-centric analysis of LLMs' proficiency in Korean culture and language.

翻译：尽管针对韩语的大语言模型（LLMs）发展迅速，但能够测试韩语所需文化与语言知识的基准数据集仍明显匮乏。由于许多现有韩语基准数据集是通过翻译英文对应数据集得到的，它们往往忽略不同的文化语境。即便有少数从韩语数据源中捕捉文化知识的基准数据集，也仅提供偏见与仇恨言论检测等狭窄任务。为填补这一空白，我们引入了韩语文化与语言智能基准（CLIcK）——一个包含1,995个问答对的数据集。CLIcK的数据源自韩国官方考试与教科书，将问题划分为语言与文化两大主类别下的十一个子类别。对于CLIcK中的每个实例，我们提供了细粒度标注，说明需要何种文化与语言知识才能正确回答问题。利用CLIcK，我们对13种语言模型进行了性能评估。评估结果揭示了这些模型在各类别上的表现差异，以及影响其理解能力的多种因素。CLIcK首次提供了以韩语为核心的大规模全面分析，探究了大语言模型在韩语文化与语言方面的熟练程度。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日