Analyzing Chain of Thought (CoT) Approaches in Control Flow Code Deobfuscation Tasks - 专知论文

会员服务 ·

0

控制器 · 代码 · CoT · MoDELS · 图 ·

Analyzing Chain of Thought (CoT) Approaches in Control Flow Code Deobfuscation Tasks

翻译：暂无翻译

Seyedreza Mohseni,Sarvesh Baskar,Edward Raff,Manas Gaur

Code deobfuscation is the task of recovering a readable version of a program while preserving its original behavior. In practice, this often requires days or even months of manual work with complex and expensive analysis tools. In this paper, we explore an alternative approach based on Chain-of-Thought (CoT) prompting, where a large language model is guided through explicit, step-by-step reasoning tailored for code analysis. We focus on control flow obfuscation, including Control Flow Flattening (CFF), Opaque Predicates, and their combination, and we measure both structural recovery of the control flow graph and preservation of program semantics. We evaluate five state-of-the-art large language models and show that CoT prompting significantly improves deobfuscation quality compared with simple prompting. We validate our approach on a diverse set of standard C benchmarks and report results using both structural metrics for control flow graphs and semantic metrics based on output similarity. Among the tested models and by applying CoT, GPT5 achieves the strongest overall performance, with an average gain of about 16% in control-flow graph reconstruction and about 20.5% in semantic preservation across our benchmarks compared to zero-shot prompting. Our results also show that model performance depends not only on the obfuscation level and the chosen obfuscator but also on the intrinsic complexity of the original control flow graph. Collectively, these findings suggest that CoT-guided large language models can serve as effective assistants for code deobfuscation, providing improved code explainability, more faithful control flow graph reconstruction, and better preservation of program behavior while potentially reducing the manual effort needed for reverse engineering.

翻译：暂无翻译

0

相关内容

控制器

COLING2024 | 面向编程的自然语言处理综述

COLING2024 | 面向编程的自然语言处理综述

专知会员服务

28+阅读 · 2024年4月23日

自编码器及其应用综述

专知会员服务

37+阅读 · 2021年10月16日

【硬核书】Linux核心编程|Linux Kernel Programming，741页pdf

【硬核书】Linux核心编程|Linux Kernel Programming，741页pdf

专知会员服务

80+阅读 · 2021年3月26日

【经典书】Linux UNIX系统编程手册，1554页pdf

【经典书】Linux UNIX系统编程手册，1554页pdf

专知会员服务

48+阅读 · 2021年2月20日

新杀器来了！Facebook AI提出DETR：用Transformers来进行端到端的目标检测

新杀器来了！Facebook AI提出DETR：用Transformers来进行端到端的目标检测

专知会员服务

51+阅读 · 2020年5月28日

【CVPR2020】语义增强的场景文本识别的编码-解码器框架，SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

【CVPR2020】语义增强的场景文本识别的编码-解码器框架，SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

专知会员服务

25+阅读 · 2020年5月22日

【CVPR2020】强化特征点，Reinforced Feature Points: Optimizing Feature Detection and Description for a High-Level Task

【CVPR2020】强化特征点，Reinforced Feature Points: Optimizing Feature Detection and Description for a High-Level Task

专知会员服务

49+阅读 · 2020年2月25日

【反馈循环自编码器】FEEDBACK RECURRENT AUTOENCODER

【反馈循环自编码器】FEEDBACK RECURRENT AUTOENCODER

专知会员服务

23+阅读 · 2020年1月28日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

NLP大牛Thomas Wolf等新书《Transformer自然语言处理》，466页pdf及代码

NLP大牛Thomas Wolf等新书《Transformer自然语言处理》，466页pdf及代码

专知

36+阅读 · 2022年2月7日

【硬核书】Linux核心编程|Linux Kernel Programming，741页pdf

【硬核书】Linux核心编程|Linux Kernel Programming，741页pdf

专知

13+阅读 · 2021年3月26日

【Github】GPT2-Chinese：中文的GPT2训练代码

【Github】GPT2-Chinese：中文的GPT2训练代码

AINLP

52+阅读 · 2019年8月23日

Github项目推荐 | PyTorch 中文手册（pytorch handbook）

Github项目推荐 | PyTorch 中文手册（pytorch handbook）

AI研习社

50+阅读 · 2019年2月18日

【Github 3.5K 星】PyTorch资源列表：450个NLP/CV/SP、论文实现、库、教程&示例

【Github 3.5K 星】PyTorch资源列表：450个NLP/CV/SP、论文实现、库、教程&示例

七月在线实验室

12+阅读 · 2018年10月25日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

CVPR 2018 | 伯克利等提出无监督特征学习新方法，代码已开源

CVPR 2018 | 伯克利等提出无监督特征学习新方法，代码已开源

AI前线

12+阅读 · 2018年5月13日

手把手教 | 深度学习库PyTorch（附代码）

手把手教 | 深度学习库PyTorch（附代码）

数据派THU

27+阅读 · 2018年3月15日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

【干货】深入理解自编码器（附代码实现）

【干货】深入理解自编码器（附代码实现）

专知

136+阅读 · 2018年3月9日

面向动态演化的网构软件失效机理与测评方法

国家自然科学基金

1+阅读 · 2015年12月31日

高速率、高频谱效率码分多址系统地址码设计研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于LDPC码和喷泉码的卫星动中通链路拥塞控制研究

国家自然科学基金

0+阅读 · 2015年12月31日

面向Bug报告的软件故障重现方法研究

国家自然科学基金

4+阅读 · 2015年12月31日

动态自适应的可伸缩视频流媒体组播编码-传输联合优化

国家自然科学基金

0+阅读 · 2015年12月31日

流密码可约性高效判别算法存在性的研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于比特置信度的低复杂度多进制LDPC码译码算法

国家自然科学基金

0+阅读 · 2015年12月31日

面向二进制程序的静态结构化符号执行与动态组合方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

多纹理多深度的3D视频码率控制研究

国家自然科学基金

0+阅读 · 2015年12月31日

面向类脑计算存储器的调制编码理论及方法研究

国家自然科学基金

1+阅读 · 2014年12月31日

Beyond Code Reasoning: A Specification-Anchored Audit Framework for Expert-Augmented Security Verification

Arxiv

0+阅读 · 4月29日

SweRank: Software Issue Localization with Code Ranking

Arxiv

0+阅读 · 4月22日

CodePivot: Bootstrapping Multilingual Transpilation in LLMs via Reinforcement Learning without Parallel Corpora

Arxiv

0+阅读 · 4月20日

A Code Smell Refactoring Approach using GNNs

Arxiv

0+阅读 · 4月19日

Pecker: Bug Localization Framework for Sequential Designs via Causal Chain Reconstruction

Arxiv

0+阅读 · 4月17日

Codes with Large Minimum Distance in Product Codes: Explicit Constructions and Bounds

Arxiv

0+阅读 · 4月16日

CIAO - Code In Architecture Out - Automated Software Architecture Documentation with Large Language Models

Arxiv

0+阅读 · 4月9日

TransAgent: Enhancing LLM-Based Code Translation via Fine-Grained Execution Alignment

Arxiv

0+阅读 · 4月7日

NaturalEdit: Code Modification through Direct Interaction with Adaptive Natural Language Representation

Arxiv

0+阅读 · 4月2日

Obfuscating Code Vulnerabilities against Static Analysis in JavaScript Code

Arxiv

0+阅读 · 4月1日

VIP会员

文章信息

相关主题

最新内容

DeepSeek 版Claude Code，免费小白安装教程来了！

DeepSeek 版Claude Code，免费小白安装教程来了！

专知会员服务

7+阅读 · 5月5日

【ICML Spotlight 2026】 T²PO: 不确定性引导的探索控制框架，实现稳定多轮Agentic强化学习

【ICML Spotlight 2026】 T²PO: 不确定性引导的探索控制框架，实现稳定多轮Agentic强化学习

专知会员服务

4+阅读 · 5月5日

基础模型驱动的工业智能体：技术成熟度、能力变迁与未竟之挑战

基础模型驱动的工业智能体：技术成熟度、能力变迁与未竟之挑战

专知会员服务

4+阅读 · 5月5日

《机动炮兵的演进与未来：技术进步、历史沿革与炮兵作战前瞻》

《机动炮兵的演进与未来：技术进步、历史沿革与炮兵作战前瞻》

专知会员服务

5+阅读 · 5月5日

《火炮弹药快速效能建模：提升互操作性与技术优势》（报告）

《火炮弹药快速效能建模：提升互操作性与技术优势》（报告）

专知会员服务

8+阅读 · 5月5日

《美空军条令出版物 2-0：情报（2026版）》

《美空军条令出版物 2-0：情报（2026版）》

专知会员服务

13+阅读 · 5月5日

美陆军“飞蝇陷阱5.0”项目将新兴技术交到作战人员手中

美陆军“飞蝇陷阱5.0”项目将新兴技术交到作战人员手中

专知会员服务

5+阅读 · 5月5日

帕兰提尔 Gotham：一个游戏规则改变器

帕兰提尔 Gotham：一个游戏规则改变器

专知会员服务

8+阅读 · 5月5日

【ICML 2026】用测试时训练线性化视觉Transformer：T⁵ 实现 Softmax 注意力到线性复杂度的快速转换

【ICML 2026】用测试时训练线性化视觉Transformer：T⁵ 实现 Softmax 注意力到线性复杂度的快速转换

专知会员服务

3+阅读 · 5月5日

【AAAI 2026】大模型做知识蒸馏：CMM将LLM特征拆解给小模型协同学习

【AAAI 2026】大模型做知识蒸馏：CMM将LLM特征拆解给小模型协同学习

专知会员服务

3+阅读 · 5月5日

【ICML Spotlight 2026 】NonZero：交互引导探索的多智能体蒙特卡洛树搜索

【ICML Spotlight 2026 】NonZero：交互引导探索的多智能体蒙特卡洛树搜索

专知会员服务

8+阅读 · 5月4日

【综述】机器人学习中的世界模型：全面综述

【综述】机器人学习中的世界模型：全面综述

专知会员服务

12+阅读 · 5月4日

伊朗的导弹-无人机行动及其对美国威慑的影响

伊朗的导弹-无人机行动及其对美国威慑的影响

专知会员服务

9+阅读 · 5月4日

《未来战术无人机系统案例研究：量身定制采办策略方法》100页报告

《未来战术无人机系统案例研究：量身定制采办策略方法》100页报告

专知会员服务

9+阅读 · 5月4日

战争贩子：2026年第一季度美国对中东潜在军售激增

战争贩子：2026年第一季度美国对中东潜在军售激增

专知会员服务

7+阅读 · 5月4日

相关VIP内容

COLING2024 | 面向编程的自然语言处理综述

COLING2024 | 面向编程的自然语言处理综述

专知会员服务

28+阅读 · 2024年4月23日

自编码器及其应用综述

专知会员服务

37+阅读 · 2021年10月16日

【硬核书】Linux核心编程|Linux Kernel Programming，741页pdf

【硬核书】Linux核心编程|Linux Kernel Programming，741页pdf

专知会员服务

80+阅读 · 2021年3月26日

【经典书】Linux UNIX系统编程手册，1554页pdf

【经典书】Linux UNIX系统编程手册，1554页pdf

专知会员服务

48+阅读 · 2021年2月20日

新杀器来了！Facebook AI提出DETR：用Transformers来进行端到端的目标检测

新杀器来了！Facebook AI提出DETR：用Transformers来进行端到端的目标检测

专知会员服务

51+阅读 · 2020年5月28日

【CVPR2020】语义增强的场景文本识别的编码-解码器框架，SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

【CVPR2020】语义增强的场景文本识别的编码-解码器框架，SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

专知会员服务

25+阅读 · 2020年5月22日

【CVPR2020】强化特征点，Reinforced Feature Points: Optimizing Feature Detection and Description for a High-Level Task

【CVPR2020】强化特征点，Reinforced Feature Points: Optimizing Feature Detection and Description for a High-Level Task

专知会员服务

49+阅读 · 2020年2月25日

【反馈循环自编码器】FEEDBACK RECURRENT AUTOENCODER

【反馈循环自编码器】FEEDBACK RECURRENT AUTOENCODER

专知会员服务

23+阅读 · 2020年1月28日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

【ICML Spotlight 2026】 T²PO: 不确定性引导的探索控制框架，实现稳定多轮Agentic强化学习

《机动炮兵的演进与未来：技术进步、历史沿革与炮兵作战前瞻》

DeepSeek 版Claude Code，免费小白安装教程来了！

基础模型驱动的工业智能体：技术成熟度、能力变迁与未竟之挑战

相关资讯

NLP大牛Thomas Wolf等新书《Transformer自然语言处理》，466页pdf及代码

NLP大牛Thomas Wolf等新书《Transformer自然语言处理》，466页pdf及代码

专知

36+阅读 · 2022年2月7日

【硬核书】Linux核心编程|Linux Kernel Programming，741页pdf

【硬核书】Linux核心编程|Linux Kernel Programming，741页pdf

专知

13+阅读 · 2021年3月26日

【Github】GPT2-Chinese：中文的GPT2训练代码

【Github】GPT2-Chinese：中文的GPT2训练代码

AINLP

52+阅读 · 2019年8月23日

Github项目推荐 | PyTorch 中文手册（pytorch handbook）

Github项目推荐 | PyTorch 中文手册（pytorch handbook）

AI研习社

50+阅读 · 2019年2月18日

【Github 3.5K 星】PyTorch资源列表：450个NLP/CV/SP、论文实现、库、教程&示例

【Github 3.5K 星】PyTorch资源列表：450个NLP/CV/SP、论文实现、库、教程&示例

七月在线实验室

12+阅读 · 2018年10月25日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

CVPR 2018 | 伯克利等提出无监督特征学习新方法，代码已开源

CVPR 2018 | 伯克利等提出无监督特征学习新方法，代码已开源

AI前线

12+阅读 · 2018年5月13日

手把手教 | 深度学习库PyTorch（附代码）

手把手教 | 深度学习库PyTorch（附代码）

数据派THU

27+阅读 · 2018年3月15日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

【干货】深入理解自编码器（附代码实现）

【干货】深入理解自编码器（附代码实现）

专知

136+阅读 · 2018年3月9日

相关论文

Beyond Code Reasoning: A Specification-Anchored Audit Framework for Expert-Augmented Security Verification

Arxiv

0+阅读 · 4月29日

SweRank: Software Issue Localization with Code Ranking

Arxiv

0+阅读 · 4月22日

CodePivot: Bootstrapping Multilingual Transpilation in LLMs via Reinforcement Learning without Parallel Corpora

Arxiv

0+阅读 · 4月20日

A Code Smell Refactoring Approach using GNNs

Arxiv

0+阅读 · 4月19日

Pecker: Bug Localization Framework for Sequential Designs via Causal Chain Reconstruction

Arxiv

0+阅读 · 4月17日

Codes with Large Minimum Distance in Product Codes: Explicit Constructions and Bounds

Arxiv

0+阅读 · 4月16日

CIAO - Code In Architecture Out - Automated Software Architecture Documentation with Large Language Models

Arxiv

0+阅读 · 4月9日

TransAgent: Enhancing LLM-Based Code Translation via Fine-Grained Execution Alignment

Arxiv

0+阅读 · 4月7日

NaturalEdit: Code Modification through Direct Interaction with Adaptive Natural Language Representation

Arxiv

0+阅读 · 4月2日

Obfuscating Code Vulnerabilities against Static Analysis in JavaScript Code

Arxiv

0+阅读 · 4月1日

相关基金

面向动态演化的网构软件失效机理与测评方法

国家自然科学基金

1+阅读 · 2015年12月31日

高速率、高频谱效率码分多址系统地址码设计研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于LDPC码和喷泉码的卫星动中通链路拥塞控制研究

国家自然科学基金

0+阅读 · 2015年12月31日

面向Bug报告的软件故障重现方法研究

国家自然科学基金

4+阅读 · 2015年12月31日

动态自适应的可伸缩视频流媒体组播编码-传输联合优化

国家自然科学基金

0+阅读 · 2015年12月31日

流密码可约性高效判别算法存在性的研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于比特置信度的低复杂度多进制LDPC码译码算法

国家自然科学基金

0+阅读 · 2015年12月31日

面向二进制程序的静态结构化符号执行与动态组合方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

多纹理多深度的3D视频码率控制研究

国家自然科学基金

0+阅读 · 2015年12月31日

面向类脑计算存储器的调制编码理论及方法研究

国家自然科学基金

1+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员