SlopeSeeker: A Search Tool for Exploring a Dataset of Quantifiable Trends

Natural language and search interfaces intuitively facilitate data exploration and provide visualization responses to diverse analytical queries based on the underlying datasets. However, these interfaces often fail to interpret more complex analytical intents, such as discerning subtleties and quantifiable differences between terms like "bump" and "spike" in the context of COVID cases, for example. We address this gap by extending the capabilities of a data exploration search interface for interpreting semantic concepts in time series trends. We first create a comprehensive dataset of semantic concepts by mapping quantifiable univariate data trends such as slope and angle to crowdsourced, semantically meaningful trend labels. The dataset contains quantifiable properties that capture the slope-scalar effect of semantic modifiers like "sharply" and "gradually," as well as multi-line trends (e.g., "peak," "valley"). We demonstrate the utility of this dataset in SlopeSeeker, a tool that supports natural language querying of quantifiable trends, such as "show me stocks that tanked in 2010." The tool incorporates novel scoring and ranking techniques based on semantic relevance and visual prominence to present relevant trend chart responses containing these semantic trend concepts. In addition, SlopeSeeker provides a faceted search interface for users to navigate a semantic hierarchy of concepts from general trends (e.g., "increase") to more specific ones (e.g., "sharp increase"). A preliminary user evaluation of the tool demonstrates that the search interface supports greater expressivity of queries containing concepts that describe data trends. We identify potential future directions for leveraging our publicly available quantitative semantics dataset in other data domains and for novel visual analytics interfaces.

翻译：自然语言与搜索界面通过直觉方式促进数据探索，并能基于底层数据集针对多样化分析查询提供可视化响应。然而，这些界面往往难以解析更复杂的分析意图，例如辨别COVID病例语境中"bump"与"spike"等术语的细微差别与量化差异。我们通过扩展数据探索搜索界面的能力来弥补这一不足，使其能够解释时间序列趋势中的语义概念。首先，我们将斜率、角度等可量化的单变量数据趋势与基于众包的语义标签进行映射，构建了一个全面的语义概念数据集。该数据集包含量化属性，可捕捉"sharply"和"gradually"等语义修饰词的斜率标量效应，以及多线趋势（如"peak"、"valley"）。我们通过SlopeSeeker工具展示了该数据集的实用价值——该工具支持对量化趋势进行自然语言查询，例如"显示2010年暴跌的股票"。该工具基于语义相关性与视觉显著性，采用创新的评分与排序技术，呈现包含这些语义趋势概念的相关趋势图表响应。此外，SlopeSeeker提供分面搜索界面，使用户能够从一般趋势（如"increase"）导航至更具体趋势（如"sharp increase"）的语义概念层级。初步用户评估表明，该搜索界面能够支持包含描述数据趋势概念查询的更高表达能力。我们识别出将该公开定量语义数据集应用于其他数据领域及新型可视化分析界面的潜在未来方向。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日