PENet+: A Lightweight Residual Transformer Framework for Efficient Image Steganalysis

Image steganalysis, the detection of hidden information embedded in digital images, is a core component of modern cybersecurity and digital forensics. Recent residual Transformer architectures, such as the Pixel-Difference-Convolution and Enhanced-Transformer-Network (PENet) [1], achieve strong detection accuracy, but their computational and memory demands hinder deployment in resource-constrained settings. We present PENet+, a lightweight steganalysis framework that preserves PENet's discriminative structure while substantially improving efficiency. Rather than redesigning or compressing the attention blocks, we retain PENet's self-attention topology for reproducibility and add a classifier-streamlining stage that progressively narrows the SPP-to-FC1 input channels (SPP: spatial pyramid pooling; FC1: first fully connected layer), yielding large reductions in parameters and FLOPs with negligible accuracy loss. We further refine the high-pass-filter (HPF) stem with an activation-aware mechanism that aggregates HPF responses early and selects a balanced SRM-Gabor top-K subset, and we replace PENet's backbone with a MobileNetV2-style inverted residual network. A balanced configuration with K=31 filters (16 Gabor + 15 SRM) matches or surpasses heavier settings at lower compute. Finally, we motivate PReLU from a steganalysis standpoint, arguing that preserving negative responses helps capture weak stego cues that ReLU suppresses. On a disjoint ALASKA2 JPEG QF90 protocol at 512x512 resolution (5,000 cover images for training, validation, and internal testing; a separate 19,000-cover evaluation set), PENet+ achieves up to 45.5% fewer parameters and about 97% fewer FLOPs than the re-evaluated PENet baseline, offering a computationally efficient direction for resource-constrained steganalysis. Device-level latency and power measurements remain future work.

翻译：图像隐写分析（检测嵌入数字图像中的隐藏信息）是现代网络安全和数字取证的核心组成部分。近期残差Transformer架构（如像素差卷积与增强Transformer网络PENet [1]）取得了较高的检测精度，但其计算与内存需求限制了在资源受限场景下的部署。我们提出PENet+，一种轻量级隐写分析框架，保留PENet判别性结构的同时显著提升效率。不同于重新设计或压缩注意力模块，我们保留PENet的自注意力拓扑以保证可复现性，并增设分类器精简阶段，逐步缩减SPP（空间金字塔池化）至FC1（首个全连接层）的输入通道数，从而在几乎不损失精度的情况下大幅减少参数量和浮点运算数。我们进一步改进高通滤波器主干，引入激活感知机制：早期聚合HPF响应并选择平衡的SRM-Gabor top-K子集，并以MobileNetV2风格的倒残差网络替代PENet的主干。采用含K=31个滤波器（16个Gabor+15个SRM）的平衡配置，可在较低计算量下达到或超越更重配置的性能。最后，从隐写分析角度论证PReLU的合理性，指出保留负响应有助于捕获ReLU抑制的弱隐迹线索。在互斥的ALASKA2 JPEG QF90协议下（512×512分辨率，训练、验证及内部测试集各含5000幅载体图像；另设19000幅载体的独立评估集），PENet+相比重新评估的PENet基线最多可减少45.5%的参数量和约97%的浮点运算数，为资源受限场景下的隐写分析提供了高效计算方向。设备级延迟和功耗测量将留待未来工作。