Toward Temporal Attribution Analytics in Dataflows - 专知论文

会员服务 ·

0

Dataflow · Processing（编程语言） · 流 · Apache Flink · 原点 ·

Toward Temporal Attribution Analytics in Dataflows

翻译：面向数据流中的时间归因分析

Chrysanthi Kosyfaki,Ruiyuan Zhang,Nikos Mamoulis,Xiaofang Zhou

Data provenance (the process of determining the origin and derivation of data outputs) has applications across multiple domains including explaining database query results and auditing scientific workflows. Despite decades of research, provenance tracing remains challenging due to its high computational cost and storage requirements. In streaming systems such as Apache Flink, fine-grained provenance graphs can grow super-linearly with data volume, posing significant scalability challenges. We define temporal attribution, a new lightweight form of provenance, appropriate for certain tasks, such as monitoring dependencies between system components over time quantitatively. Temporal attribution enables time-focused analysis that does not require fine-grained, tuple-level dependency meta-data. Inspired by volume-based provenance tracking in Temporal Interaction Networks (TINs), we demonstrate TINs' applicability in succinctly modeling quantified data exchanges between dataflow operators in stream data processing systems and in processing workflows, in general, over time. We classify data into discrete and liquid types, define five temporal provenance query types, and propose a state-based indexing approach. Our vision outlines research directions toward making this new form of temporal attribution a practical tool for large-scale dataflow analytics.

翻译：数据溯源（确定数据输出的来源与推导过程的技术）在多个领域具有应用价值，包括解释数据库查询结果和审计科学工作流。尽管已有数十年研究，但由于高昂的计算成本与存储需求，溯源追踪仍面临挑战。在Apache Flink等流处理系统中，细粒度溯源图的规模可能随数据量呈超线性增长，带来显著的可扩展性问题。我们定义了时间归因——一种适用于特定任务的新型轻量级溯源形式，例如定量监控系统组件间随时间变化的依赖关系。时间归因支持面向时间的分析，无需细粒度的元组级依赖元数据。受时序交互网络（TINs）中基于数据量的溯源追踪方法启发，我们展示了TINs在流数据处理系统及处理工作流中，对数据流算子间量化数据交换进行简洁建模的通用性。我们将数据分为离散型与液态型，定义了五类时间溯源查询，并提出基于状态的索引方法。本文勾勒了将这种新型时间归因发展为大规模数据流分析实用工具的研究方向。

0

相关内容

Dataflow

西安交大最新《深度学习因果模型》综述论文，35页pdf涵盖292篇文献阐述三种数据范式因果模型

西安交大最新《深度学习因果模型》综述论文，35页pdf涵盖292篇文献阐述三种数据范式因果模型

专知会员服务

63+阅读 · 2023年11月5日

以数据为中心的图机器学习

以数据为中心的图机器学习

专知会员服务

38+阅读 · 2023年9月25日

中科院计算所最新《时态数据因果发现》综述，50页pdf详述多元时间和事件序列因果发现

中科院计算所最新《时态数据因果发现》综述，50页pdf详述多元时间和事件序列因果发现

专知会员服务

86+阅读 · 2023年3月23日

【干货书】基于统计和机器学习的实用时间序列分析预测，Practical Time Series Analysis Prediction with Statistics & Machine Learning

【干货书】基于统计和机器学习的实用时间序列分析预测，Practical Time Series Analysis Prediction with Statistics & Machine Learning

专知会员服务

144+阅读 · 2022年4月8日

「时空数据分析」综述论文，44页pdf

专知会员服务

81+阅读 · 2021年3月20日

【阿里巴巴-达摩院】深度学习的时间序列数据增强综述，Time Series Data Augmentation for Deep Learning: A Survey

【阿里巴巴-达摩院】深度学习的时间序列数据增强综述，Time Series Data Augmentation for Deep Learning: A Survey

专知会员服务

134+阅读 · 2020年3月2日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

【图机器学习论文】综述：网络表示学习（Network Representation Learning: A Survey）

【图机器学习论文】综述：网络表示学习（Network Representation Learning: A Survey）

专知会员服务

92+阅读 · 2019年12月16日

【O'Reilly TensorFlow Conference 2019】基于TensorFlow的实时流数据机器学习（Machine learning over real-time streaming data with TensorFlow）

【O'Reilly TensorFlow Conference 2019】基于TensorFlow的实时流数据机器学习（Machine learning over real-time streaming data with TensorFlow）

专知会员服务

28+阅读 · 2019年11月14日

【报告推荐】线上食品推荐中的数据分析（Computational Data Analytics on the Web for Better Food Decision Making）

【报告推荐】线上食品推荐中的数据分析（Computational Data Analytics on the Web for Better Food Decision Making）

专知会员服务

16+阅读 · 2019年10月2日

时空数据挖掘:综述

时空数据挖掘:综述

专知

36+阅读 · 2022年6月30日

【干货书】基于统计和机器学习的实用时间序列分析预测，Time Series Analysis Prediction

【干货书】基于统计和机器学习的实用时间序列分析预测，Time Series Analysis Prediction

专知

18+阅读 · 2022年4月9日

基于深度学习的数据融合方法研究综述

基于深度学习的数据融合方法研究综述

专知

37+阅读 · 2020年12月10日

最新「因果推断Causal Inference」综述论文38页pdf，阿里巴巴、Buffalo、Georgia、Virginia

最新「因果推断Causal Inference」综述论文38页pdf，阿里巴巴、Buffalo、Georgia、Virginia

专知

68+阅读 · 2020年2月11日

面试题：请简要介绍下tensorflow的计算图

面试题：请简要介绍下tensorflow的计算图

七月在线实验室

14+阅读 · 2019年6月10日

【入门】数据分析六部曲

【入门】数据分析六部曲

36大数据

18+阅读 · 2017年12月6日

tensorflow系列笔记：流程，概念和代码解析

tensorflow系列笔记：流程，概念和代码解析

北京思腾合力科技有限公司

30+阅读 · 2017年11月11日

图上的归纳表示学习

图上的归纳表示学习

科技创新与创业

23+阅读 · 2017年11月9日

关于数据挖掘，有几本书推荐给你......

关于数据挖掘，有几本书推荐给你......

图灵教育

16+阅读 · 2017年10月11日

【大数据】数据挖掘与数据分析知识流程梳理

【大数据】数据挖掘与数据分析知识流程梳理

产业智能官

13+阅读 · 2017年9月22日

基于略图挖掘的在不同时空域的网络流式数据实时处理

国家自然科学基金

1+阅读 · 2015年12月31日

不确定数据流的分布并行Skyline查询技术研究

国家自然科学基金

1+阅读 · 2015年12月31日

抽样环境下基于流记录的行为特征分析与多分类器识别模型研究

国家自然科学基金

0+阅读 · 2015年12月31日

多标记文本数据流分类方法研究

国家自然科学基金

3+阅读 · 2015年12月31日

面向大数据的群体偏好决策分析研究

国家自然科学基金

6+阅读 · 2014年12月31日

基于认知计算的大数据分析方法

国家自然科学基金

25+阅读 · 2014年12月31日

面向大规模数据流的集成学习模型与方法研究

国家自然科学基金

5+阅读 · 2014年12月31日

因果推断的统计方法

国家自然科学基金

26+阅读 · 2011年12月31日

时间序列数据挖掘中的聚类模型与算法研究

国家自然科学基金

14+阅读 · 2008年12月31日

因果推断及不完全数据的统计分析

国家自然科学基金

23+阅读 · 2008年12月31日

Conditional Attribution for Root Cause Analysis in Time-Series Anomaly Detection

Arxiv

0+阅读 · 6月16日

All Eyes on the Workflow: Automated and Efficient Event Discovery from Video Streams

Arxiv

0+阅读 · 6月15日

Time Series Analysis in Machine Learning

Arxiv

0+阅读 · 6月10日

Determination Provenance: From Ambiguity to Algebra

Arxiv

0+阅读 · 6月9日

Chain of Evidence: Pixel-Level Visual Attribution for Iterative Retrieval-Augmented Generation

Arxiv

0+阅读 · 5月23日

Causal mediation analysis for longitudinal and survival data in continuous time using Bayesian non-parametric joint models

Arxiv

0+阅读 · 5月18日

On the Fragility of Data Attribution When Learning Is Distributed

Arxiv

0+阅读 · 5月15日

A Survey on Diffusion Models for Time Series and Spatio-Temporal Data

A Survey on Diffusion Models for Time Series and Spatio-Temporal Data

Arxiv

14+阅读 · 2024年4月29日

A Review and Roadmap of Deep Causal Model from Different Causal Structures and Representations

Arxiv

13+阅读 · 2023年11月2日

A Survey on Deep Reinforcement Learning for Data Processing and Analytics

Arxiv

24+阅读 · 2022年2月4日

VIP会员

文章信息

相关主题

Processing（编程语言）

最新内容

ICML 2026 | CFPO：用反事实策略优化提升多模态推理

ICML 2026 | CFPO：用反事实策略优化提升多模态推理

专知会员服务

1+阅读 · 今天14:45

综述 | 世界动作模型：少做梦，多行动

综述 | 世界动作模型：少做梦，多行动

专知会员服务

1+阅读 · 今天14:43

美以伊冲突：无人机与人工智能的运用

美以伊冲突：无人机与人工智能的运用

专知会员服务

4+阅读 · 今天14:31

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

专知会员服务

3+阅读 · 今天14:20

《特种部队在透明战场中的生存力》最新报告

《特种部队在透明战场中的生存力》最新报告

专知会员服务

2+阅读 · 今天14:11

《自主无人机蜂群协同与控制系统：人工智能赋能的战场协同与自主任务编排平台》

《自主无人机蜂群协同与控制系统：人工智能赋能的战场协同与自主任务编排平台》

专知会员服务

3+阅读 · 今天14:07

《人工智能生成的零日漏洞：对未来作战的影响》

《人工智能生成的零日漏洞：对未来作战的影响》

专知会员服务

3+阅读 · 今天14:03

《理解伙伴国在防务能力选择中的偏好：探索美国解决方案的替代选择》美智库200页报告

《理解伙伴国在防务能力选择中的偏好：探索美国解决方案的替代选择》美智库200页报告

专知会员服务

2+阅读 · 今天13:59

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

专知会员服务

5+阅读 · 6月22日

综述 | 3D场景图：开放挑战与未来方向

综述 | 3D场景图：开放挑战与未来方向

专知会员服务

8+阅读 · 6月22日

《国防工业6.0：全自主作战系统、量子-人工智能融合与新一代战略威慑》

《国防工业6.0：全自主作战系统、量子-人工智能融合与新一代战略威慑》

专知会员服务

7+阅读 · 6月22日

21世纪的无人机战争

21世纪的无人机战争

专知会员服务

4+阅读 · 6月22日

《伊朗与以色列-美国热战及其对数字技术的影响》

《伊朗与以色列-美国热战及其对数字技术的影响》

专知会员服务

5+阅读 · 6月22日

《量子技术的军事任务技术适配与利用》

《量子技术的军事任务技术适配与利用》

专知会员服务

5+阅读 · 6月22日

《美国陆军军官学校（西点军校）本科生科研中生成式人工智能的使用》

《美国陆军军官学校（西点军校）本科生科研中生成式人工智能的使用》

专知会员服务

8+阅读 · 6月22日

相关VIP内容

西安交大最新《深度学习因果模型》综述论文，35页pdf涵盖292篇文献阐述三种数据范式因果模型

西安交大最新《深度学习因果模型》综述论文，35页pdf涵盖292篇文献阐述三种数据范式因果模型

专知会员服务

63+阅读 · 2023年11月5日

以数据为中心的图机器学习

以数据为中心的图机器学习

专知会员服务

38+阅读 · 2023年9月25日

中科院计算所最新《时态数据因果发现》综述，50页pdf详述多元时间和事件序列因果发现

中科院计算所最新《时态数据因果发现》综述，50页pdf详述多元时间和事件序列因果发现

专知会员服务

86+阅读 · 2023年3月23日

【干货书】基于统计和机器学习的实用时间序列分析预测，Practical Time Series Analysis Prediction with Statistics & Machine Learning

【干货书】基于统计和机器学习的实用时间序列分析预测，Practical Time Series Analysis Prediction with Statistics & Machine Learning

专知会员服务

144+阅读 · 2022年4月8日

「时空数据分析」综述论文，44页pdf

专知会员服务

81+阅读 · 2021年3月20日

【阿里巴巴-达摩院】深度学习的时间序列数据增强综述，Time Series Data Augmentation for Deep Learning: A Survey

【阿里巴巴-达摩院】深度学习的时间序列数据增强综述，Time Series Data Augmentation for Deep Learning: A Survey

专知会员服务

134+阅读 · 2020年3月2日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

【图机器学习论文】综述：网络表示学习（Network Representation Learning: A Survey）

【图机器学习论文】综述：网络表示学习（Network Representation Learning: A Survey）

专知会员服务

92+阅读 · 2019年12月16日

【O'Reilly TensorFlow Conference 2019】基于TensorFlow的实时流数据机器学习（Machine learning over real-time streaming data with TensorFlow）

【O'Reilly TensorFlow Conference 2019】基于TensorFlow的实时流数据机器学习（Machine learning over real-time streaming data with TensorFlow）

专知会员服务

28+阅读 · 2019年11月14日

【报告推荐】线上食品推荐中的数据分析（Computational Data Analytics on the Web for Better Food Decision Making）

【报告推荐】线上食品推荐中的数据分析（Computational Data Analytics on the Web for Better Food Decision Making）

专知会员服务

16+阅读 · 2019年10月2日

热门VIP内容

开通专知VIP会员享更多权益服务

综述 | 世界动作模型：少做梦，多行动

《战时图神经网络：整合以色列-伊朗冲突中的网络安全与无人机智能》最新50页文献

ICML 2026 | CFPO：用反事实策略优化提升多模态推理

美以伊冲突：无人机与人工智能的运用

相关资讯

时空数据挖掘:综述

时空数据挖掘:综述

专知

36+阅读 · 2022年6月30日

【干货书】基于统计和机器学习的实用时间序列分析预测，Time Series Analysis Prediction

【干货书】基于统计和机器学习的实用时间序列分析预测，Time Series Analysis Prediction

专知

18+阅读 · 2022年4月9日

基于深度学习的数据融合方法研究综述

基于深度学习的数据融合方法研究综述

专知

37+阅读 · 2020年12月10日

最新「因果推断Causal Inference」综述论文38页pdf，阿里巴巴、Buffalo、Georgia、Virginia

最新「因果推断Causal Inference」综述论文38页pdf，阿里巴巴、Buffalo、Georgia、Virginia

专知

68+阅读 · 2020年2月11日

面试题：请简要介绍下tensorflow的计算图

面试题：请简要介绍下tensorflow的计算图

七月在线实验室

14+阅读 · 2019年6月10日

【入门】数据分析六部曲

【入门】数据分析六部曲

36大数据

18+阅读 · 2017年12月6日

tensorflow系列笔记：流程，概念和代码解析

tensorflow系列笔记：流程，概念和代码解析

北京思腾合力科技有限公司

30+阅读 · 2017年11月11日

图上的归纳表示学习

图上的归纳表示学习

科技创新与创业

23+阅读 · 2017年11月9日

关于数据挖掘，有几本书推荐给你......

关于数据挖掘，有几本书推荐给你......

图灵教育

16+阅读 · 2017年10月11日

【大数据】数据挖掘与数据分析知识流程梳理

【大数据】数据挖掘与数据分析知识流程梳理

产业智能官

13+阅读 · 2017年9月22日

相关论文

Conditional Attribution for Root Cause Analysis in Time-Series Anomaly Detection

Arxiv

0+阅读 · 6月16日

All Eyes on the Workflow: Automated and Efficient Event Discovery from Video Streams

Arxiv

0+阅读 · 6月15日

Time Series Analysis in Machine Learning

Arxiv

0+阅读 · 6月10日

Determination Provenance: From Ambiguity to Algebra

Arxiv

0+阅读 · 6月9日

Chain of Evidence: Pixel-Level Visual Attribution for Iterative Retrieval-Augmented Generation

Arxiv

0+阅读 · 5月23日

Causal mediation analysis for longitudinal and survival data in continuous time using Bayesian non-parametric joint models

Arxiv

0+阅读 · 5月18日

On the Fragility of Data Attribution When Learning Is Distributed

Arxiv

0+阅读 · 5月15日

A Survey on Diffusion Models for Time Series and Spatio-Temporal Data

A Survey on Diffusion Models for Time Series and Spatio-Temporal Data

Arxiv

14+阅读 · 2024年4月29日

A Review and Roadmap of Deep Causal Model from Different Causal Structures and Representations

Arxiv

13+阅读 · 2023年11月2日

A Survey on Deep Reinforcement Learning for Data Processing and Analytics

Arxiv

24+阅读 · 2022年2月4日

相关基金

基于略图挖掘的在不同时空域的网络流式数据实时处理

国家自然科学基金

1+阅读 · 2015年12月31日

不确定数据流的分布并行Skyline查询技术研究

国家自然科学基金

1+阅读 · 2015年12月31日

抽样环境下基于流记录的行为特征分析与多分类器识别模型研究

国家自然科学基金

0+阅读 · 2015年12月31日

多标记文本数据流分类方法研究

国家自然科学基金

3+阅读 · 2015年12月31日

面向大数据的群体偏好决策分析研究

国家自然科学基金

6+阅读 · 2014年12月31日

基于认知计算的大数据分析方法

国家自然科学基金

25+阅读 · 2014年12月31日

面向大规模数据流的集成学习模型与方法研究

国家自然科学基金

5+阅读 · 2014年12月31日

因果推断的统计方法

国家自然科学基金

26+阅读 · 2011年12月31日

时间序列数据挖掘中的聚类模型与算法研究

国家自然科学基金

14+阅读 · 2008年12月31日

因果推断及不完全数据的统计分析

国家自然科学基金

23+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员