Natural language and search interfaces intuitively facilitate data exploration and provide visualization responses to diverse analytical queries based on the underlying datasets. However, these interfaces often fail to interpret more complex analytical intents, such as discerning subtleties and quantifiable differences between terms like "bump" and "spike" in the context of COVID cases, for example. We address this gap by extending the capabilities of a data exploration search interface for interpreting semantic concepts in time series trends. We first create a comprehensive dataset of semantic concepts by mapping quantifiable univariate data trends such as slope and angle to crowdsourced, semantically meaningful trend labels. The dataset contains quantifiable properties that capture the slope-scalar effect of semantic modifiers like "sharply" and "gradually," as well as multi-line trends (e.g., "peak," "valley"). We demonstrate the utility of this dataset in SlopeSeeker, a tool that supports natural language querying of quantifiable trends, such as "show me stocks that tanked in 2010." The tool incorporates novel scoring and ranking techniques based on semantic relevance and visual prominence to present relevant trend chart responses containing these semantic trend concepts. In addition, SlopeSeeker provides a faceted search interface for users to navigate a semantic hierarchy of concepts from general trends (e.g., "increase") to more specific ones (e.g., "sharp increase"). A preliminary user evaluation of the tool demonstrates that the search interface supports greater expressivity of queries containing concepts that describe data trends. We identify potential future directions for leveraging our publicly available quantitative semantics dataset in other data domains and for novel visual analytics interfaces.
翻译:自然语言与搜索界面通过直觉方式促进数据探索,并能基于底层数据集针对多样化分析查询提供可视化响应。然而,这些界面往往难以解析更复杂的分析意图,例如辨别COVID病例语境中"bump"与"spike"等术语的细微差别与量化差异。我们通过扩展数据探索搜索界面的能力来弥补这一不足,使其能够解释时间序列趋势中的语义概念。首先,我们将斜率、角度等可量化的单变量数据趋势与基于众包的语义标签进行映射,构建了一个全面的语义概念数据集。该数据集包含量化属性,可捕捉"sharply"和"gradually"等语义修饰词的斜率标量效应,以及多线趋势(如"peak"、"valley")。我们通过SlopeSeeker工具展示了该数据集的实用价值——该工具支持对量化趋势进行自然语言查询,例如"显示2010年暴跌的股票"。该工具基于语义相关性与视觉显著性,采用创新的评分与排序技术,呈现包含这些语义趋势概念的相关趋势图表响应。此外,SlopeSeeker提供分面搜索界面,使用户能够从一般趋势(如"increase")导航至更具体趋势(如"sharp increase")的语义概念层级。初步用户评估表明,该搜索界面能够支持包含描述数据趋势概念查询的更高表达能力。我们识别出将该公开定量语义数据集应用于其他数据领域及新型可视化分析界面的潜在未来方向。