Recent studies have demonstrated that incorporating Chain-of-Thought (CoT) reasoning into the detection process can enhance a model's ability to detect synthetic images. However, excessively lengthy reasoning incurs substantial resource overhead, including token consumption and latency, which is particularly redundant when handling obviously generated forgeries. To address this issue, we propose Fake-HR1, a large-scale hybrid-reasoning model that, to the best of our knowledge, is the first to adaptively determine whether reasoning is necessary based on the characteristics of the generative detection task. To achieve this, we design a two-stage training framework: we first perform Hybrid Fine-Tuning (HFT) for cold-start initialization, followed by online reinforcement learning with Hybrid-Reasoning Grouped Policy Optimization (HGRPO) to implicitly learn when to select an appropriate reasoning mode. Experimental results show that Fake-HR1 adaptively performs reasoning across different types of queries, surpassing existing LLMs in both reasoning ability and generative detection performance, while significantly improving response efficiency.
翻译:近期研究表明,将思维链推理引入检测过程能够提升模型识别合成图像的能力。然而,过长的推理过程会带来显著的资源开销,包括令牌消耗与延迟,这在处理明显属于生成伪造的图像时尤为冗余。为解决该问题,我们提出Fake-HR1——一种大规模混合推理模型。据我们所知,这是首个能根据生成式检测任务特性自适应判断是否需要进行推理的模型。为实现这一目标,我们设计了一个两阶段训练框架:首先通过混合微调进行冷启动初始化,随后采用混合推理分组策略优化的在线强化学习,以隐式学习何时选择恰当的推理模式。实验结果表明,Fake-HR1能够针对不同类型的查询自适应执行推理,在推理能力和生成式检测性能上均超越现有大语言模型,同时显著提升了响应效率。