This paper is concerned with the lossy compression of general random variables, specifically with rate-distortion theory and quantization of random variables taking values in general measurable spaces such as, e.g., manifolds and fractal sets. Manifold structures are prevalent in data science, e.g., in compressed sensing, machine learning, image processing, and handwritten digit recognition. Fractal sets find application in image compression and in the modeling of Ethernet traffic. Our main contributions are bounds on the rate-distortion function and the quantization error. These bounds are very general and essentially only require the existence of reference measures satisfying certain regularity conditions in terms of small ball probabilities. To illustrate the wide applicability of our results, we particularize them to random variables taking values in i) manifolds, namely, hyperspheres and Grassmannians, and ii) self-similar sets characterized by iterated function systems satisfying the weak separation property.
翻译:本文研究一般随机变量的有损压缩问题,具体关注取值于一般可测空间(如流形和分形集)的随机变量的率失真理论与量化。流形结构在数据科学中普遍存在,例如压缩感知、机器学习、图像处理及手写数字识别等领域。分形集在图像压缩和以太网流量建模中具有应用价值。本文的主要贡献在于给出了率失真函数与量化误差的边界条件。这些边界条件具有高度普适性,仅需存在满足特定小球概率正则条件的参考测度。为阐明结果的广泛适用性,我们将其具体应用于两类随机变量:i)取值于流形(超球面与格拉斯曼流形)的随机变量;ii)由满足弱分离条件的迭代函数系统刻画的自治集。