Interpretable Transformation and Analysis of Timelines through Learning via Surprisability

The analysis of high-dimensional timeline data and the identification of outliers and anomalies is critical across diverse domains, including sensor readings, biological and medical data, historical records, and global statistics. However, conventional analysis techniques often struggle with challenges such as high dimensionality, complex distributions, and sparsity. These limitations hinder the ability to extract meaningful insights from complex temporal datasets, making it difficult to identify trending features, outliers, and anomalies effectively. Inspired by surprisability -- a cognitive science concept describing how humans instinctively focus on unexpected deviations - we propose Learning via Surprisability (LvS), a novel approach for transforming high-dimensional timeline data. LvS quantifies and prioritizes anomalies in time-series data by formalizing deviations from expected behavior. LvS bridges cognitive theories of attention with computational methods, enabling the detection of anomalies and shifts in a way that preserves critical context, offering a new lens for interpreting complex datasets. We demonstrate the usefulness of LvS on three high-dimensional timeline use cases: a time series of sensor data, a global dataset of mortality causes over multiple years, and a textual corpus containing over two centuries of State of the Union Addresses by U.S. presidents. Our results show that the LvS transformation enables efficient and interpretable identification of outliers, anomalies, and the most variable features along the timeline.

翻译：高维时间线数据的分析及离群值与异常值的识别在传感器读数、生物与医学数据、历史记录和全球统计数据等多个领域至关重要。然而，传统分析技术常面临高维度、复杂分布和稀疏性等挑战。这些限制阻碍了从复杂时序数据集中提取有意义见解的能力，使得有效识别趋势特征、离群值和异常值变得困难。受可惊奇性——一个描述人类如何本能关注意外偏差的认知科学概念——启发，我们提出基于可惊奇性的学习（Learning via Surprisability, LvS），这是一种用于变换高维时间线数据的新方法。LvS通过形式化预期行为的偏差来量化并优先处理时序数据中的异常值。该方法将注意力认知理论与计算方法相连接，能够在保留关键上下文的同时检测异常和变化，为解释复杂数据集提供了新视角。我们在三个高维时间线用例中验证了LvS的有效性：传感器数据时间序列、多年全球死因数据集以及包含两个多世纪美国总统国情咨文的文本语料库。结果表明，LvS变换能够沿时间线高效且可解释地识别离群值、异常值及变化最显著的特征。

相关内容

LVS

关注 0

LVS （Linux虚拟服务器） LVS集群采用IP负载均衡技术和基于内容请求分发技术。调度器具有很好的吞吐率，将请求均衡地转移到不同的服务器上执行，且调度器自动屏蔽掉服务器的故障，从而将一组服务器构成一个高性能的、高可用的虚拟服务器。整个服务器集群的结构对客户是透明的，而且无需修改客户端和服务器端的程序。为此，在设计时需要考虑系统的透明性、可伸缩性、高可用性和易管理性。

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日