Skyline Operators for Document Spanners - 专知论文

会员服务 ·

0

天际线 · 提取 · 间隔 · 声明 · 信息提取 ·

2023 年 4 月 12 日

Skyline Operators for Document Spanners

翻译：用于文档提取器的Skyline算子

Antoine Amarilli,Benny Kimelfeld,Sébastien Labbé,Stefan Mengel

from arxiv, 42 pages. Submitted

When extracting a relation of spans (intervals) from a text document, a common practice is to filter out tuples of the relation that are deemed dominated by others. The domination rule is defined as a partial order that varies along different systems and tasks. For example, we may state that a tuple is dominated by tuples which extend it by assigning additional attributes, or assigning larger intervals. The result of filtering the relation would then be the skyline according to this partial order. As this filtering may remove most of the extracted tuples, we study whether we can improve the performance of the extraction by compiling the domination rule into the extractor. To this aim, we introduce the skyline operator for declarative information extraction tasks expressed as document spanners. We show that this operator can be expressed via regular operations when the domination partial order can itself be expressed as a regular spanner, which covers several natural domination rules. Yet, we show that the skyline operator incurs a computational cost (under combined complexity). First, there are cases where the operator requires an exponential blowup on the number of states needed to represent the spanner as a sequential variable-set automaton. Second, the evaluation may become computationally hard. Our analysis more precisely identifies classes of domination rules for which the combined complexity is tractable or intractable.

翻译：在从文本文档中提取跨度（区间）关系时，一种常见做法是过滤掉被认为被其他元组支配的元组。支配规则被定义为一种偏序关系，这种关系随不同系统和任务而变化。例如，我们可以规定，一个元组被那些通过分配额外属性或分配更大区间来扩展它的元组所支配。根据这种偏序关系过滤后的结果即为Skyline。由于这种过滤可能移除大部分提取的元组，我们研究是否可以通过将支配规则编译到提取器中来改进提取性能。为此，我们引入了面向声明式信息提取任务（表示为文档提取器）的Skyline算子。我们证明，当支配偏序关系本身可表示为正则提取器时（这涵盖了多种自然支配规则），该算子可通过正则运算实现。然而，我们发现Skyline算子会带来计算代价（在组合复杂度下）。首先，在某些情况下，该算子会导致表示提取器所需的顺序变量集自动机状态数呈指数级增长。其次，其评估可能变得计算困难。我们的分析更精确地识别了那些组合复杂度可解或难解的支配规则类别。

0

相关内容

天际线

EMNLP 2021 | 基于证据检索和图神经验证网络的表格事实验证模型

EMNLP 2021 | 基于证据检索和图神经验证网络的表格事实验证模型

专知会员服务

20+阅读 · 2021年12月12日

【论文推荐】文本摘要简述

【论文推荐】文本摘要简述

专知会员服务

69+阅读 · 2020年7月20日

【哈工大】基于文档的对话系统(DGDS)综述，A Survey of Document Grounded Dialogue Systems (DGDS)

【哈工大】基于文档的对话系统(DGDS)综述，A Survey of Document Grounded Dialogue Systems (DGDS)

专知会员服务

36+阅读 · 2020年4月30日

【论文推荐】 GIANT: Scalable Creation of a Web-scale Ontology，基于web本体的可扩展创建

【论文推荐】 GIANT: Scalable Creation of a Web-scale Ontology，基于web本体的可扩展创建

专知会员服务

21+阅读 · 2020年4月5日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

【NLP| 推荐文章】基于知识库的问答系统关键技术综述（Core techniques of question answering systems over knowledge bases：a survey）

专知会员服务

47+阅读 · 2019年11月24日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

使用BERT做文本摘要

使用BERT做文本摘要

专知

23+阅读 · 2019年12月7日

每周一起读 | ACL 2019 & NAACL 2019：文本关系抽取专题沙龙

每周一起读 | ACL 2019 & NAACL 2019：文本关系抽取专题沙龙

PaperWeekly

43+阅读 · 2019年6月26日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【泡泡一分钟】用于RGBD语义分割的三维图神经网络(ICCV2017-546)

【泡泡一分钟】用于RGBD语义分割的三维图神经网络(ICCV2017-546)

泡泡机器人SLAM

22+阅读 · 2018年12月4日

【论文推荐】最新九篇自动问答相关论文—可解释推理网络、上下文知识图谱嵌入、注意力RNN、Multi-Cast注意力网络

【论文推荐】最新九篇自动问答相关论文—可解释推理网络、上下文知识图谱嵌入、注意力RNN、Multi-Cast注意力网络

专知

15+阅读 · 2018年6月29日

已删除

科学网

60+阅读 · 2018年2月9日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

汉语篇章连贯性分析：话题结构、逻辑语义结构及其联合学习研究

国家自然科学基金

0+阅读 · 2014年12月31日

多房棘球绦虫Argonaute蛋白新类群在小RNA诱导的沉默途径中的功能研究

国家自然科学基金

0+阅读 · 2014年12月31日

汉语句法分析中的自动歧义识别和分类问题研究

国家自然科学基金

0+阅读 · 2013年12月31日

可用于超级电容器的非金属掺杂石墨烯微球的可控制备及其形成机制

国家自然科学基金

0+阅读 · 2013年12月31日

基于模糊拓扑及多特征融合的遥感影像亚像元定位

国家自然科学基金

0+阅读 · 2012年12月31日

大型语义辞典的自动生成及在文本分析中的应用

国家自然科学基金

1+阅读 · 2012年12月31日

上下文感知的Web服务自适应计算模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于概率与区间的高效混合不确定优化设计技术

国家自然科学基金

0+阅读 · 2011年12月31日

基于依存图的汉语依存分析技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

轴对称的Navier-Stokes方程

国家自然科学基金

1+阅读 · 2011年12月31日

Generic Temporal Reasoning with Differential Analysis and Explanation

Generic Temporal Reasoning with Differential Analysis and Explanation

Arxiv

0+阅读 · 2023年5月31日

Verifying an Effect-Handler-Based Define-By-Run Reverse-Mode AD Library

Arxiv

0+阅读 · 2023年5月31日

Video frame interpolation for high dynamic range sequences captured with dual-exposure sensors

Arxiv

0+阅读 · 2023年5月31日

Table Detection for Visually Rich Document Images

Arxiv

0+阅读 · 2023年5月30日

Robust Multimodal Failure Detection for Microservice Systems

Arxiv

0+阅读 · 2023年5月30日

ACETest: Automated Constraint Extraction for Testing Deep Learning Operators

Arxiv

0+阅读 · 2023年5月29日

On random number generators and practical market efficiency

Arxiv

0+阅读 · 2023年5月27日

A Study of Documentation for Software Architecture

Arxiv

0+阅读 · 2023年5月26日

A Comprehensive Survey on Multimodal Recommender Systems: Taxonomy, Evaluation, and Future Directions

Arxiv

16+阅读 · 2023年2月9日

Mining Dual Emotion for Fake News Detection

Arxiv

13+阅读 · 2020年10月19日

VIP会员

文章信息

相关主题

最新内容

《反无人机蜂群：有人-无人协同防御场景下的编队重构分析》

《反无人机蜂群：有人-无人协同防御场景下的编队重构分析》

专知会员服务

6+阅读 · 7月24日

《史诗怒火/咆哮雄狮行动：针对伊朗空中战役的战略分析》68页智库报告

《史诗怒火/咆哮雄狮行动：针对伊朗空中战役的战略分析》68页智库报告

专知会员服务

6+阅读 · 7月24日

“愈演愈烈的欺骗与干扰博弈”：无人机与人工智能背景下俄乌强化以无人机为核心的电子战

“愈演愈烈的欺骗与干扰博弈”：无人机与人工智能背景下俄乌强化以无人机为核心的电子战

专知会员服务

3+阅读 · 7月24日

乌克兰纵深打击如何重塑俄罗斯的战略选择

乌克兰纵深打击如何重塑俄罗斯的战略选择

专知会员服务

2+阅读 · 7月24日

《分布式太空任务对比分析与综合建模及仿真环境》120页

《分布式太空任务对比分析与综合建模及仿真环境》120页

专知会员服务

2+阅读 · 7月24日

俄乌战争中关于中程打击无人机部署的经验启示

俄乌战争中关于中程打击无人机部署的经验启示

专知会员服务

1+阅读 · 7月24日

《远程自主系统可扩展态势感知的解决方案》32页2026最新报告

《远程自主系统可扩展态势感知的解决方案》32页2026最新报告

专知会员服务

5+阅读 · 7月23日

《基于强化学习的自动化红队测试》

《基于强化学习的自动化红队测试》

专知会员服务

4+阅读 · 7月23日

《下一代无人机-卫星通信：人工智能创新与未来展望》32页长综述

《下一代无人机-卫星通信：人工智能创新与未来展望》32页长综述

专知会员服务

7+阅读 · 7月23日

“天降毒雾”：无人机如何使化学战重返乌克兰战场

“天降毒雾”：无人机如何使化学战重返乌克兰战场

专知会员服务

2+阅读 · 7月23日

伊朗不对称防空战略的演进

伊朗不对称防空战略的演进

专知会员服务

4+阅读 · 7月23日

对抗环境下超视距目标打击的情报支援

对抗环境下超视距目标打击的情报支援

专知会员服务

11+阅读 · 7月22日

《面向复杂地形下无人机跟踪地面机器人（UAV–UGV）的自适应多滤波器扩展卡尔曼滤波框架》

《面向复杂地形下无人机跟踪地面机器人（UAV–UGV）的自适应多滤波器扩展卡尔曼滤波框架》

专知会员服务

5+阅读 · 7月22日

纵深侦察：大规模作战行动中远程侦察与监视之迫切需求

纵深侦察：大规模作战行动中远程侦察与监视之迫切需求

专知会员服务

8+阅读 · 7月22日

共享认知，分布式研判：复杂行动中的美国空军指挥控制（万字长文）

共享认知，分布式研判：复杂行动中的美国空军指挥控制（万字长文）

专知会员服务

11+阅读 · 7月22日

相关VIP内容

EMNLP 2021 | 基于证据检索和图神经验证网络的表格事实验证模型

EMNLP 2021 | 基于证据检索和图神经验证网络的表格事实验证模型

专知会员服务

20+阅读 · 2021年12月12日

【论文推荐】文本摘要简述

【论文推荐】文本摘要简述

专知会员服务

69+阅读 · 2020年7月20日

【哈工大】基于文档的对话系统(DGDS)综述，A Survey of Document Grounded Dialogue Systems (DGDS)

【哈工大】基于文档的对话系统(DGDS)综述，A Survey of Document Grounded Dialogue Systems (DGDS)

专知会员服务

36+阅读 · 2020年4月30日

【论文推荐】 GIANT: Scalable Creation of a Web-scale Ontology，基于web本体的可扩展创建

【论文推荐】 GIANT: Scalable Creation of a Web-scale Ontology，基于web本体的可扩展创建

专知会员服务

21+阅读 · 2020年4月5日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

【NLP| 推荐文章】基于知识库的问答系统关键技术综述（Core techniques of question answering systems over knowledge bases：a survey）

专知会员服务

47+阅读 · 2019年11月24日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《史诗怒火/咆哮雄狮行动：针对伊朗空中战役的战略分析》68页智库报告

乌克兰纵深打击如何重塑俄罗斯的战略选择

《反无人机蜂群：有人-无人协同防御场景下的编队重构分析》

“愈演愈烈的欺骗与干扰博弈”：无人机与人工智能背景下俄乌强化以无人机为核心的电子战

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

使用BERT做文本摘要

使用BERT做文本摘要

专知

23+阅读 · 2019年12月7日

每周一起读 | ACL 2019 & NAACL 2019：文本关系抽取专题沙龙

每周一起读 | ACL 2019 & NAACL 2019：文本关系抽取专题沙龙

PaperWeekly

43+阅读 · 2019年6月26日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【泡泡一分钟】用于RGBD语义分割的三维图神经网络(ICCV2017-546)

【泡泡一分钟】用于RGBD语义分割的三维图神经网络(ICCV2017-546)

泡泡机器人SLAM

22+阅读 · 2018年12月4日

【论文推荐】最新九篇自动问答相关论文—可解释推理网络、上下文知识图谱嵌入、注意力RNN、Multi-Cast注意力网络

【论文推荐】最新九篇自动问答相关论文—可解释推理网络、上下文知识图谱嵌入、注意力RNN、Multi-Cast注意力网络

专知

15+阅读 · 2018年6月29日

已删除

科学网

60+阅读 · 2018年2月9日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

相关论文

Generic Temporal Reasoning with Differential Analysis and Explanation

Generic Temporal Reasoning with Differential Analysis and Explanation

Arxiv

0+阅读 · 2023年5月31日

Verifying an Effect-Handler-Based Define-By-Run Reverse-Mode AD Library

Arxiv

0+阅读 · 2023年5月31日

Video frame interpolation for high dynamic range sequences captured with dual-exposure sensors

Arxiv

0+阅读 · 2023年5月31日

Table Detection for Visually Rich Document Images

Arxiv

0+阅读 · 2023年5月30日

Robust Multimodal Failure Detection for Microservice Systems

Arxiv

0+阅读 · 2023年5月30日

ACETest: Automated Constraint Extraction for Testing Deep Learning Operators

Arxiv

0+阅读 · 2023年5月29日

On random number generators and practical market efficiency

Arxiv

0+阅读 · 2023年5月27日

A Study of Documentation for Software Architecture

Arxiv

0+阅读 · 2023年5月26日

A Comprehensive Survey on Multimodal Recommender Systems: Taxonomy, Evaluation, and Future Directions

Arxiv

16+阅读 · 2023年2月9日

Mining Dual Emotion for Fake News Detection

Arxiv

13+阅读 · 2020年10月19日

相关基金

汉语篇章连贯性分析：话题结构、逻辑语义结构及其联合学习研究

国家自然科学基金

0+阅读 · 2014年12月31日

多房棘球绦虫Argonaute蛋白新类群在小RNA诱导的沉默途径中的功能研究

国家自然科学基金

0+阅读 · 2014年12月31日

汉语句法分析中的自动歧义识别和分类问题研究

国家自然科学基金

0+阅读 · 2013年12月31日

可用于超级电容器的非金属掺杂石墨烯微球的可控制备及其形成机制

国家自然科学基金

0+阅读 · 2013年12月31日

基于模糊拓扑及多特征融合的遥感影像亚像元定位

国家自然科学基金

0+阅读 · 2012年12月31日

大型语义辞典的自动生成及在文本分析中的应用

国家自然科学基金

1+阅读 · 2012年12月31日

上下文感知的Web服务自适应计算模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于概率与区间的高效混合不确定优化设计技术

国家自然科学基金

0+阅读 · 2011年12月31日

基于依存图的汉语依存分析技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

轴对称的Navier-Stokes方程

国家自然科学基金

1+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员