ReSIM: Re-ranking Binary Similarity Embeddings to Improve Function Search Performance

Binary Function Similarity (BFS), the problem of determining whether two binary functions originate from the same source code, has been extensively studied in recent research across security, software engineering, and machine learning communities. This interest arises from its central role in developing vulnerability detection systems, copyright infringement analysis, and malware phylogeny tools. Nearly all binary function similarity systems embed assembly functions into real-valued vectors, where similar functions map to points that lie close to each other in the metric space. These embeddings enable function search: a query function is embedded and compared against a database of candidate embeddings to retrieve the most similar matches. Despite their effectiveness, such systems rely on bi-encoder architectures that embed functions independently, limiting their ability to capture cross-function relationships and similarities. To address this limitation, we introduce ReSIM, a novel and enhanced function search system that complements embedding-based search with a neural re-ranker. Unlike traditional embedding models, our reranking module jointly processes query-candidate pairs to compute ranking scores based on their mutual representation, allowing for more accurate similarity assessment. By re-ranking the top results from embedding-based retrieval, ReSIM leverages fine-grained relation information that bi-encoders cannot capture. We evaluate ReSIM across seven embedding models on two benchmark datasets, demonstrating consistent improvements in search effectiveness, with average gains of 21.7% in terms of nDCG and 27.8% in terms of Recall.

翻译：二进制函数相似性（BFS）旨在判断两个二进制函数是否源自同一源代码，近年来在安全、软件工程和机器学习领域得到了广泛研究。这一研究方向的重要性源于其在漏洞检测系统开发、版权侵权分析和恶意软件谱系构建工具中的核心作用。几乎所有的二进制函数相似性系统都将汇编函数嵌入为实值向量，使得相似函数在度量空间中映射为彼此接近的点。这些嵌入支持函数搜索：查询函数被嵌入后，与候选嵌入数据库进行比较以检索最相似的匹配。尽管此类系统效果显著，但它们依赖于双编码器架构独立嵌入函数，限制了其捕捉跨函数关系和相似性的能力。为克服这一局限，我们提出了ReSIM，一种新颖且增强的函数搜索系统，通过神经重排序模块对基于嵌入的搜索进行补充。与传统嵌入模型不同，我们的重排序模块联合处理查询-候选对，基于其相互表示计算排序分数，从而实现更精确的相似性评估。通过对基于嵌入检索的顶部结果进行重排序，ReSIM能够利用双编码器无法捕捉的细粒度关系信息。我们在两个基准数据集上对七种嵌入模型评估ReSIM，结果表明其在搜索效果上实现了持续提升，nDCG平均提升21.7%，召回率平均提升27.8%。