The expenses involved in training state-of-the-art deep hashing retrieval models have witnessed an increase due to the adoption of more sophisticated models and large-scale datasets. Dataset Distillation (DD) or Dataset Condensation(DC) focuses on generating smaller synthetic dataset that retains the original information. Nevertheless, existing DD methods face challenges in maintaining a trade-off between accuracy and efficiency. And the state-of-the-art dataset distillation methods can not expand to all deep hashing retrieval methods. In this paper, we propose an efficient condensation framework that addresses these limitations by matching the feature-embedding between synthetic set and real set. Furthermore, we enhance the diversity of features by incorporating the strategies of early-stage augmented models and multi-formation. Extensive experiments provide compelling evidence of the remarkable superiority of our approach, both in terms of performance and efficiency, compared to state-of-the-art baseline methods.
翻译:深度哈希检索模型的训练成本因采用更复杂的模型和大规模数据集而不断攀升。数据集精炼(Dataset Distillation,DD)或数据集压缩(Dataset Condensation,DC)致力于生成保留原始信息的小型合成数据集。然而,现有DD方法在平衡准确性与效率方面面临挑战,且当前最先进的数据集精炼方法无法推广至所有深度哈希检索方法。本文提出一种高效的压缩框架,通过匹配合成集与真实集之间的特征嵌入来克服这些局限。此外,我们融合早期增强模型与多形态策略,提升了特征的多样性。大量实验有力证明,与最先进的基线方法相比,我们的方法在性能与效率上均具有显著优势。