A composite source, consisting of multiple subsources and a memoryless switch, outputs one symbol at a time from the subsource selected by the switch. If some data should be encoded more accurately than other data from an information source, the composite source model is suitable because in this model different distortion constraints can be put on the subsources. In this context, we propose subsource-dependent fidelity criteria for composite sources and use them to formulate a rate-distortion problem. We solve the problem and obtain a single-letter expression for the rate-distortion function. Further rate-distortion analysis characterizes the performance of classify-then-compress (CTC) coding, which is frequently used in practice when subsource-dependent fidelity criteria are considered. Our analysis shows that CTC coding generally has performance loss relative to optimal coding, even if the classification is perfect. We also identify the cause of the performance loss, that is, class labels have to be reproduced in CTC coding. Last but not least, we show that the performance loss is negligible for asymptotically small distortion if CTC coding is appropriately designed and some mild conditions are satisfied.
翻译:复合信源由多个子源和一个无记忆开关构成,每次根据开关选择从相应子源输出一个符号。当信息源中某些数据需要比其他数据以更高精度编码时,复合信源模型具有适用性,因为该模型可以对不同子源施加不同的失真约束。在此背景下,我们针对复合信源提出子源相关保真度准则,并以此构建率失真问题。我们求解该问题,得到了率失真函数的单字母表达式。进一步的率失真分析刻画了"先分类后压缩"编码方案的性能,该方案在考虑子源相关保真度准则的实际应用中经常被采用。分析表明,即使分类完全准确,CTC编码相对于最优编码通常存在性能损失。我们还揭示了性能损失的根源在于CTC编码中必须复现类别标签。最后但同样重要的是,我们证明当失真渐近趋零时,若CTC编码设计得当且满足某些温和条件,其性能损失可以忽略不计。