Existing linguistic steganography schemes often overlook the conditional probability (CP) of tokens in the candidate pool, allocating the one coding to all tokens, which results in identical selection likelihoods. This approach leads to the selection of low-CP tokens, degrading the quality of stegos and making them more detectable. This paper proposes a scheme based on the interval allocated, called DAIRstega. DAIRstega first uses a portion of the read secret to build the roulette area. Then, this scheme uses the idea of the roulette wheel and takes the CPs of tokens as the main basis for allocating the roulette area (i.e., the interval length). Thus, tokens with larger CPs are allocated more area. The secret will have an increased likelihood of selecting a token with a higher CP. During allocation, we designed some allocation functions and three constraints to optimize the process. Additionally, DAIRstega supports prompt-based controllable generation of stegos. Rich experiments show that the proposed embedding way and DAIRstega perform better than the existing ways and baselines, which shows strong perceptual, statistical, and semantic concealment, as well as anti-steganalysis ability. It can also generate high-quality longer stegos, addressing the deficiencies in this task. DAIRstega is confirmed to have potential as a secure watermarking, offering insights for its development.
翻译:现有语言隐写方案常忽略候选池中词元的条件概率,对所有词元分配相同的编码区间,导致其选择概率均等。这种方法会促使低条件概率词元被选中,从而降低隐写文本质量并增加其可检测性。本文提出一种基于区间分配的方案,称为DAIRstega。该方案首先利用部分读取的密文构建轮盘区域,随后借鉴轮盘赌思想,以词元条件概率作为分配轮盘区域(即区间长度)的主要依据。因此,条件概率较大的词元将获得更多区域面积,从而提升高条件概率词元被密文选中的可能性。在分配过程中,我们设计了若干分配函数与三项约束以优化流程。此外,DAIRstega支持基于提示的隐写文本可控生成。大量实验表明,所提出的嵌入方法与DAIRstega在性能上优于现有方法与基线模型,展现出强大的感知隐蔽性、统计隐蔽性、语义隐蔽性及抗隐写分析能力。该方案还能生成高质量的长文本隐写内容,弥补了当前任务中的不足。DAIRstega被证实具备成为安全水印技术的潜力,为其发展提供了新的思路。