Text provides a compelling example of unstructured data that can be used to motivate and explore classification problems. Challenges arise regarding the representation of features of text and student linkage between text representations as character strings and identification of features that embed connections with underlying phenomena. In order to observe how students reason with text data in scenarios designed to elicit certain aspects of the domain, we employed a task-based interview method using a structured protocol with six pairs of undergraduate students. Our goal was to shed light on students' understanding of text as data using a motivating task to classify headlines as "clickbait" or "news". Three types of features (function, content, and form) surfaced, the majority from the first scenario. Our analysis of the interviews indicates that this sequence of activities engaged the participants in thinking at both the human-perception level and the computer-extraction level and conceptualizing connections between them.
翻译:摘要:文本作为非结构化数据的一个引人注目的范例,可用于激发和探索分类问题。文本特征的表征方式以及学生将文本字符串表示形式与蕴含潜在现象关联的特征识别之间的关联存在挑战。为观察学生在旨在诱发领域特定方面的情境中如何对文本数据进行推理,我们采用基于任务的访谈方法,使用结构化协议对六组本科生进行了研究。我们的目标是通过将头条新闻分类为"点击诱饵"(clickbait)或"新闻"(news)的驱动任务,揭示学生对文本作为数据的理解。研究中浮现出三类特征(功能特征、内容特征和形式特征),其中大部分特征出现在第一个情境中。我们对访谈的分析表明,这一系列活动促使参与者在人类感知层面和计算机提取层面进行思考,并概念化两层之间的连接。