Deep learning technologies have brought us many models that outperform human beings on a few benchmarks. An interesting question is: can these models well solve real-world problems with similar settings (e.g., identical input/output) to the benchmark datasets? We argue that a model is trained to answer the same information need for which the training dataset is created. Although some datasets may share high structural similarities, e.g., question-answer pairs for the question answering (QA) task and image-caption pairs for the image captioning (IC) task, they may represent different research tasks aiming for answering different information needs. To support our argument, we use the QA task and IC task as two case studies and compare their widely used benchmark datasets. From the perspective of information need in the context of information retrieval, we show the differences in the dataset creation processes, and the differences in morphosyntactic properties between datasets. The differences in these datasets can be attributed to the different information needs of the specific research tasks. We encourage all researchers to consider the information need the perspective of a research task before utilizing a dataset to train a model. Likewise, while creating a dataset, researchers may also incorporate the information need perspective as a factor to determine the degree to which the dataset accurately reflects the research task they intend to tackle.
翻译:深度学习技术带来了众多在若干基准测试中超越人类表现的模型。一个有趣的问题是:这些模型能否有效解决与基准数据集具有相同设置(例如相同输入/输出)的真实世界问题?我们主张,模型训练旨在回答与其训练数据集构建时所对应的相同信息需求。尽管某些数据集可能具有高度结构相似性(例如问答任务的问答对与图像描述任务的图像-描述对),但它们可能代表旨在回答不同信息需求的不同研究任务。为支持这一论点,我们以问答任务和图像描述任务为例,对比分析了各自广泛使用的基准数据集。从信息检索语境下的信息需求视角出发,我们揭示了数据集创建过程的差异,以及数据集之间形态句法特征的差异。这些数据集的差异可归因于特定研究任务的不同信息需求。我们鼓励所有研究人员在利用数据集训练模型前,从信息需求角度审视研究任务。同样,在创建数据集时,研究者也应将信息需求视角作为评估数据集准确反映所研究任务程度的考量因素。