Document tamper detection has always been an important aspect of tamper detection. Before the advent of deep learning, document tamper detection was difficult. We have made some explorations in the field of text tamper detection based on deep learning. Our Ps tamper detection method includes three steps: feature assistance, audit point positioning, and tamper recognition. It involves hierarchical filtering and graded output (tampered/suspected tampered/untampered). By combining artificial tamper data features, we simulate and augment data samples in various scenarios (cropping with noise addition/replacement, single character/space replacement, smearing/splicing, brightness/contrast adjustment, etc.). The auxiliary features include exif/binary stream keyword retrieval/noise, which are used for branch detection based on the results. Audit point positioning uses detection frameworks and controls thresholds for high and low density detection. Tamper recognition employs a dual-path dual-stream recognition network, with RGB and ELA stream feature extraction. After dimensionality reduction through self-correlation percentile pooling, the fused output is processed through vlad, yielding an accuracy of 0.804, recall of 0.659, and precision of 0.913.
翻译:文档篡改检测一直是篡改检测领域的重要方面。在深度学习出现之前,文档篡改检测十分困难。我们在基于深度学习的文本篡改检测领域进行了一些探索。我们的PS篡改检测方法包括三个步骤:特征辅助、审计点定位和篡改识别。该方法涉及分层过滤和分级输出(已篡改/疑似篡改/未篡改)。通过结合人工篡改数据特征,我们在多种场景(裁剪并添加噪声/替换、单字符/空格替换、涂抹/拼接、亮度/对比度调整等)下模拟并扩充了数据样本。辅助特征包括exif/二进制流关键词检索/噪声,用于基于结果的分支检测。审计点定位使用检测框架,并控制高低密度检测的阈值。篡改识别采用双路径双流识别网络,进行RGB与ELA流特征提取。通过自相关百分位数池化进行降维后,融合输出经vlad处理,实现了准确率0.804、召回率0.659、精确率0.913。