In coding theory, handling errors that occur when symbols are inserted or deleted from a transmitted message is a long-standing challenge. Optimising redundancy for insertion and deletion channels remains a key open problem with significant importance for applications in DNA data storage and document exchange. Recently, a coding framework known as function-correcting codes has been proposed to address the challenge of minimising redundancy while preserving specific functions of the message. This framework has gained attention due to its potential applications in machine learning systems and long-term archival data storage. Motivated by the problem of redundancy optimisation for insertion and deletion channels, we propose a new framework called function-correcting codes for insdel channels. In this paper, we introduce the notions of function-correcting insertion codes, function-correcting deletion codes, and function-correcting insdel codes, and we show that these three formulations are equivalent. We then define insdel distance matrices and irregular insdel-distance codes, and derive lower and upper bounds on the optimal redundancy achievable by function-correcting codes for insdel channels. In addition, we establish Gilbert-Varshamov and Plotkin-like bounds on the length of irregular insdel-distance codes. Using the relation between optimal redundancy and the length of such codes, we obtain a simplified lower bound on optimal redundancy. Finally, we derive bounds on the optimal redundancy of function-correcting insdel codes for several classes of functions, including locally bounded functions, VT syndrome functions, the number-of-runs function, and the maximum-run-length function.
翻译:在编码理论中,处理因传输消息中符号被插入或删除而产生的错误是一个长期存在的挑战。针对插入与删除信道优化冗余度,对于DNA数据存储和文档交换等应用具有重要意义,仍是一个关键的开放性问题。最近,一种称为函数校正码的编码框架被提出,旨在解决在保持消息特定功能的同时最小化冗余度的挑战。该框架因其在机器学习系统和长期归档数据存储中的潜在应用而受到关注。受插入与删除信道冗余度优化问题的启发,我们提出了一种称为插入-删除信道函数校正码的新框架。本文引入了函数校正插入码、函数校正删除码以及函数校正插入-删除码的概念,并证明了这三种表述是等价的。随后,我们定义了插入-删除距离矩阵和不规则插入-删除距离码,并推导了插入-删除信道函数校正码可达到的最优冗余度的下界和上界。此外,我们针对不规则插入-删除距离码的长度建立了类吉尔伯特-瓦尔沙莫夫界和类普洛特金界。利用最优冗余度与此类码长度之间的关系,我们得到了最优冗余度的一个简化下界。最后,我们针对包括局部有界函数、VT校验子函数、游程数函数以及最大游程长度函数在内的多类函数,推导了函数校正插入-删除码最优冗余度的界。