Coding Schemes for Document Exchange under Multiple Substring Edits - 专知论文

会员服务 ·

0

比特 · 构建 · 低复杂度 · 均匀分布 · 有效编码 ·

Coding Schemes for Document Exchange under Multiple Substring Edits

翻译：多子串编辑下的文档交换编码方案

Hrishi Narayanan,Vinayak Ramkumar,Rawad Bitar,Antonia Wachter-Zeh

We study the document exchange problem under multiple substring edits. A substring edit in a string $\mathbf{x}$ occurs when a substring $\mathbf{u}$ of $\mathbf{x}$ is replaced by an arbitrary string $\mathbf{v}$. The lengths of $\mathbf{u}$ and $\mathbf{v}$ are bounded from above by a fixed constant. Let $\mathbf{x}$ and $\mathbf{y}$ be two binary strings that differ by multiple substring edits. The aim of document exchange schemes is to construct an encoding of $\mathbf{x}$ with small length such that $\mathbf{x}$ can be recovered using $\mathbf{y}$ and the encoding. We construct a low-complexity document exchange scheme with encoding length of $4t\log n+o(\log n)$ bits, where $n$ is the length of the string $\mathbf{x}$. The best known scheme achieves an encoding length of $4t \log n+O(\log\log n)$ bits, but at a much higher computational complexity. Then, we investigate the average length of valid encodings for document exchange schemes with uniform strings $\mathbf{x}$ and develop a scheme with an expected encoding length of $(4t-1) \log n+o(\log n)$ bits. In this setting, prior works have only constructed schemes for a single substring edit.

翻译：我们研究了多子串编辑下的文档交换问题。字符串 $\mathbf{x}$ 中的子串编辑是指将 $\mathbf{x}$ 的一个子串 $\mathbf{u}$ 替换为任意字符串 $\mathbf{v}$，其中 $\mathbf{u}$ 和 $\mathbf{v}$ 的长度上界由固定常数限定。设 $\mathbf{x}$ 和 $\mathbf{y}$ 为两个因多子串编辑而不同的二进制字符串。文档交换方案的目标是构造一个长度较小的 $\mathbf{x}$ 编码，使得能够利用 $\mathbf{y}$ 和该编码恢复 $\mathbf{x}$。我们构建了一种低复杂度的文档交换方案，其编码长度为 $4t\log n+o(\log n)$ 比特，其中 $n$ 为字符串 $\mathbf{x}$ 的长度。目前已知的最佳方案虽然能达到 $4t \log n+O(\log\log n)$ 比特的编码长度，但其计算复杂度显著更高。随后，我们针对均匀分布字符串 $\mathbf{x}$ 的文档交换方案，研究了有效编码的平均长度，并开发了一种期望编码长度为 $(4t-1) \log n+o(\log n)$ 比特的方案。在此设定下，先前工作仅构建了针对单子串编辑的方案。

0

相关内容

EMNLP2023：MMEdit——如何编辑多模态大语言模型？

EMNLP2023：MMEdit——如何编辑多模态大语言模型？

专知会员服务

39+阅读 · 2023年11月5日

《分布式多智能体强化学习的编码》加州大学等

《分布式多智能体强化学习的编码》加州大学等

专知会员服务

55+阅读 · 2022年11月2日

【干货书】《XcalableMP PGAS编程语言》，265页pdf，XcalableMP PGAS Programming Language

【干货书】《XcalableMP PGAS编程语言》，265页pdf，XcalableMP PGAS Programming Language

专知会员服务

11+阅读 · 2022年3月24日

【ACL2021】Hi-Transformer：一种具有层次化和交互式特点的长文档建模结构

专知会员服务

13+阅读 · 2021年8月4日

【KDD2020】通用文档预训练模型LayoutLM：文档结构信息和视觉信息进行建模，让模型在预训练阶段进行多模态对齐。

【KDD2020】通用文档预训练模型LayoutLM：文档结构信息和视觉信息进行建模，让模型在预训练阶段进行多模态对齐。

专知会员服务

32+阅读 · 2020年8月23日

【2020关键词提取】使用多个本地功能从单个文档中提取关键字，YAKE! Keyword extraction from single documents using multiple local features

【2020关键词提取】使用多个本地功能从单个文档中提取关键字，YAKE! Keyword extraction from single documents using multiple local features

专知会员服务

26+阅读 · 2020年5月2日

【WWW2020】学习上下文化文档表示用于医疗答案检索，Learning Contextualized Document Representations for Healthcare Answer Retrieval

【WWW2020】学习上下文化文档表示用于医疗答案检索，Learning Contextualized Document Representations for Healthcare Answer Retrieval

专知会员服务

26+阅读 · 2020年2月10日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

【反馈循环自编码器】FEEDBACK RECURRENT AUTOENCODER

【反馈循环自编码器】FEEDBACK RECURRENT AUTOENCODER

专知会员服务

23+阅读 · 2020年1月28日

【VLDB2019 tutorial】TextCube：自动构建和多维探索，TextCube: Automated Construction and Multidimensional Exploration，韩家炜，Jingbo Shang

【VLDB2019 tutorial】TextCube：自动构建和多维探索，TextCube: Automated Construction and Multidimensional Exploration，韩家炜，Jingbo Shang

专知会员服务

27+阅读 · 2019年8月29日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

Keras新增TextVectorization层，可直接将文本字符串作为模型输入

Keras新增TextVectorization层，可直接将文本字符串作为模型输入

专知

19+阅读 · 2019年11月22日

【资源】NLP多标签文本分类代码实现工具包

【资源】NLP多标签文本分类代码实现工具包

专知

40+阅读 · 2019年11月20日

多任务学习(Multitask-Learning)相关资料、经典论文、开源代码整理分享

多任务学习(Multitask-Learning)相关资料、经典论文、开源代码整理分享

深度学习与NLP

45+阅读 · 2019年10月22日

手把手 | 基于TextRank算法的文本摘要（附Python代码）

手把手 | 基于TextRank算法的文本摘要（附Python代码）

大数据文摘

11+阅读 · 2018年12月27日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

文本分类又来了，用 Scikit-Learn 解决多类文本分类问题

文本分类又来了，用 Scikit-Learn 解决多类文本分类问题

AI研习社

14+阅读 · 2018年7月22日

【干货】一文读懂什么是变分自编码器

【干货】一文读懂什么是变分自编码器

专知

12+阅读 · 2018年2月11日

用于数学的 10 个优秀编程语言

用于数学的 10 个优秀编程语言

算法与数据结构

13+阅读 · 2018年1月5日

论文报告 | Graph-based Neural Multi-Document Summarization

论文报告 | Graph-based Neural Multi-Document Summarization

科技创新与创业

15+阅读 · 2017年12月15日

广义多用户环境下多接收者加密方案的研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于内容分析的低复杂度高效视频编码方法

国家自然科学基金

0+阅读 · 2015年12月31日

多标记文本数据流分类方法研究

国家自然科学基金

3+阅读 · 2015年12月31日

多信源协作网络编码与QC-LDPC码的联合设计和迭代译码研究

国家自然科学基金

0+阅读 · 2015年12月31日

保持结构的交互式图像及视频编辑方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于比特置信度的低复杂度多进制LDPC码译码算法

国家自然科学基金

0+阅读 · 2015年12月31日

面向无线异构网络中多媒体信息组播的多速率网络编码理论和应用研究

国家自然科学基金

0+阅读 · 2015年12月31日

量子码的构造

国家自然科学基金

1+阅读 · 2015年12月31日

面向二进制程序的静态结构化符号执行与动态组合方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

多元多项式环的Hermite性质与多项式矩阵的分解

国家自然科学基金

0+阅读 · 2014年12月31日

Online computation of normalized substring complexity

Arxiv

0+阅读 · 2月16日

UniRef-Image-Edit: Towards Scalable and Consistent Multi-Reference Image Editing

Arxiv

0+阅读 · 2月15日

Shifting the Breaking Point of Flow Matching for Multi-Instance Editing

Arxiv

0+阅读 · 2月10日

Efficient Long-Document Reranking via Block-Level Embeddings and Top-k Interaction Refinement

Arxiv

0+阅读 · 2月5日

Source Coding with Free Bits and the Multi-Way Number Partitioning Problem

Arxiv

0+阅读 · 1月29日

Unified Multimodal Interleaved Document Representation for Retrieval

Arxiv

0+阅读 · 1月23日

Functional Logic Program Transformations

Arxiv

0+阅读 · 1月19日

D2D Coded Caching Schemes for Multiaccess Networks with Combinatorial Access Topology

Arxiv

0+阅读 · 1月19日

Computing Maximal Repeating Subsequences in a String

Arxiv

0+阅读 · 1月18日

Convertible Codes for Data and Device Heterogeneity

Arxiv

0+阅读 · 1月15日

VIP会员

文章信息

相关主题

最新内容

国外海军作战管理系统与作战训练系统

国外海军作战管理系统与作战训练系统

专知会员服务

0+阅读 · 今天4:16

美军条令《海军陆战队规划流程（2026版）》

美军条令《海军陆战队规划流程（2026版）》

专知会员服务

6+阅读 · 今天3:36

《压缩式分布式交互仿真标准》120页

《压缩式分布式交互仿真标准》120页

专知会员服务

3+阅读 · 今天3:21

《电子战数据交换模型研究报告》

《电子战数据交换模型研究报告》

专知会员服务

4+阅读 · 今天3:13

美军运用水下无人机与机器人系统竞速清除霍尔木兹海峡水雷

美军运用水下无人机与机器人系统竞速清除霍尔木兹海峡水雷

专知会员服务

4+阅读 · 今天2:55

《基于Transformer的异常舰船导航识别与跟踪》80页

《基于Transformer的异常舰船导航识别与跟踪》80页

专知会员服务

5+阅读 · 今天2:45

《美国太空系统司令部实验室原型作战管理系统的数据与决策可追溯性》

《美国太空系统司令部实验室原型作战管理系统的数据与决策可追溯性》

专知会员服务

4+阅读 · 今天2:41

《低数据领域军事目标检测模型研究》

《低数据领域军事目标检测模型研究》

专知会员服务

4+阅读 · 今天2:37

《为韧性而设计：在战略不确定时代提升军事空军基地的生存能力》

《为韧性而设计：在战略不确定时代提升军事空军基地的生存能力》

专知会员服务

4+阅读 · 今天2:32

【CMU博士论文】物理世界的视觉感知与深度理解

【CMU博士论文】物理世界的视觉感知与深度理解

专知会员服务

7+阅读 · 4月22日

多智能体系统：从经典范式到大基础模型驱动的未来

多智能体系统：从经典范式到大基础模型驱动的未来

专知会员服务

10+阅读 · 4月22日

伊朗战争停火期间美军关键弹药状况分析

伊朗战争停火期间美军关键弹药状况分析

专知会员服务

8+阅读 · 4月22日

电子战革命：塑造战场的十年突破（2015–2025）

电子战革命：塑造战场的十年突破（2015–2025）

专知会员服务

6+阅读 · 4月22日

人工智能赋能电子战解决方案：实现电磁优势的认知方法（万字长文）

人工智能赋能电子战解决方案：实现电磁优势的认知方法（万字长文）

专知会员服务

9+阅读 · 4月22日

《基于模型的系统工程框架及其在电子战系统中的应用》

《基于模型的系统工程框架及其在电子战系统中的应用》

专知会员服务

8+阅读 · 4月22日

相关VIP内容

EMNLP2023：MMEdit——如何编辑多模态大语言模型？

EMNLP2023：MMEdit——如何编辑多模态大语言模型？

专知会员服务

39+阅读 · 2023年11月5日

《分布式多智能体强化学习的编码》加州大学等

《分布式多智能体强化学习的编码》加州大学等

专知会员服务

55+阅读 · 2022年11月2日

【干货书】《XcalableMP PGAS编程语言》，265页pdf，XcalableMP PGAS Programming Language

【干货书】《XcalableMP PGAS编程语言》，265页pdf，XcalableMP PGAS Programming Language

专知会员服务

11+阅读 · 2022年3月24日

【ACL2021】Hi-Transformer：一种具有层次化和交互式特点的长文档建模结构

专知会员服务

13+阅读 · 2021年8月4日

【KDD2020】通用文档预训练模型LayoutLM：文档结构信息和视觉信息进行建模，让模型在预训练阶段进行多模态对齐。

【KDD2020】通用文档预训练模型LayoutLM：文档结构信息和视觉信息进行建模，让模型在预训练阶段进行多模态对齐。

专知会员服务

32+阅读 · 2020年8月23日

【2020关键词提取】使用多个本地功能从单个文档中提取关键字，YAKE! Keyword extraction from single documents using multiple local features

【2020关键词提取】使用多个本地功能从单个文档中提取关键字，YAKE! Keyword extraction from single documents using multiple local features

专知会员服务

26+阅读 · 2020年5月2日

【WWW2020】学习上下文化文档表示用于医疗答案检索，Learning Contextualized Document Representations for Healthcare Answer Retrieval

【WWW2020】学习上下文化文档表示用于医疗答案检索，Learning Contextualized Document Representations for Healthcare Answer Retrieval

专知会员服务

26+阅读 · 2020年2月10日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

【反馈循环自编码器】FEEDBACK RECURRENT AUTOENCODER

【反馈循环自编码器】FEEDBACK RECURRENT AUTOENCODER

专知会员服务

23+阅读 · 2020年1月28日

【VLDB2019 tutorial】TextCube：自动构建和多维探索，TextCube: Automated Construction and Multidimensional Exploration，韩家炜，Jingbo Shang

【VLDB2019 tutorial】TextCube：自动构建和多维探索，TextCube: Automated Construction and Multidimensional Exploration，韩家炜，Jingbo Shang

专知会员服务

27+阅读 · 2019年8月29日

热门VIP内容

开通专知VIP会员享更多权益服务

美军条令《海军陆战队规划流程（2026版）》

《电子战数据交换模型研究报告》

国外海军作战管理系统与作战训练系统

《压缩式分布式交互仿真标准》120页

相关资讯

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

Keras新增TextVectorization层，可直接将文本字符串作为模型输入

Keras新增TextVectorization层，可直接将文本字符串作为模型输入

专知

19+阅读 · 2019年11月22日

【资源】NLP多标签文本分类代码实现工具包

【资源】NLP多标签文本分类代码实现工具包

专知

40+阅读 · 2019年11月20日

多任务学习(Multitask-Learning)相关资料、经典论文、开源代码整理分享

多任务学习(Multitask-Learning)相关资料、经典论文、开源代码整理分享

深度学习与NLP

45+阅读 · 2019年10月22日

手把手 | 基于TextRank算法的文本摘要（附Python代码）

手把手 | 基于TextRank算法的文本摘要（附Python代码）

大数据文摘

11+阅读 · 2018年12月27日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

文本分类又来了，用 Scikit-Learn 解决多类文本分类问题

文本分类又来了，用 Scikit-Learn 解决多类文本分类问题

AI研习社

14+阅读 · 2018年7月22日

【干货】一文读懂什么是变分自编码器

【干货】一文读懂什么是变分自编码器

专知

12+阅读 · 2018年2月11日

用于数学的 10 个优秀编程语言

用于数学的 10 个优秀编程语言

算法与数据结构

13+阅读 · 2018年1月5日

论文报告 | Graph-based Neural Multi-Document Summarization

论文报告 | Graph-based Neural Multi-Document Summarization

科技创新与创业

15+阅读 · 2017年12月15日

相关论文

Online computation of normalized substring complexity

Arxiv

0+阅读 · 2月16日

UniRef-Image-Edit: Towards Scalable and Consistent Multi-Reference Image Editing

Arxiv

0+阅读 · 2月15日

Shifting the Breaking Point of Flow Matching for Multi-Instance Editing

Arxiv

0+阅读 · 2月10日

Efficient Long-Document Reranking via Block-Level Embeddings and Top-k Interaction Refinement

Arxiv

0+阅读 · 2月5日

Source Coding with Free Bits and the Multi-Way Number Partitioning Problem

Arxiv

0+阅读 · 1月29日

Unified Multimodal Interleaved Document Representation for Retrieval

Arxiv

0+阅读 · 1月23日

Functional Logic Program Transformations

Arxiv

0+阅读 · 1月19日

D2D Coded Caching Schemes for Multiaccess Networks with Combinatorial Access Topology

Arxiv

0+阅读 · 1月19日

Computing Maximal Repeating Subsequences in a String

Arxiv

0+阅读 · 1月18日

Convertible Codes for Data and Device Heterogeneity

Arxiv

0+阅读 · 1月15日

相关基金

广义多用户环境下多接收者加密方案的研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于内容分析的低复杂度高效视频编码方法

国家自然科学基金

0+阅读 · 2015年12月31日

多标记文本数据流分类方法研究

国家自然科学基金

3+阅读 · 2015年12月31日

多信源协作网络编码与QC-LDPC码的联合设计和迭代译码研究

国家自然科学基金

0+阅读 · 2015年12月31日

保持结构的交互式图像及视频编辑方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于比特置信度的低复杂度多进制LDPC码译码算法

国家自然科学基金

0+阅读 · 2015年12月31日

面向无线异构网络中多媒体信息组播的多速率网络编码理论和应用研究

国家自然科学基金

0+阅读 · 2015年12月31日

量子码的构造

国家自然科学基金

1+阅读 · 2015年12月31日

面向二进制程序的静态结构化符号执行与动态组合方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

多元多项式环的Hermite性质与多项式矩阵的分解

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员