Software composition analysis (SCA) denotes the process of identifying open-source software components in an input software application. SCA has been extensively developed and adopted by academia and industry. However, we notice that the modern SCA techniques in industry scenarios still need to be improved due to privacy concerns. Overall, SCA requires the users to upload their applications' source code to a remote SCA server, which then inspects the applications and reports the component usage to users. This process is privacy-sensitive since the applications may contain sensitive information, such as proprietary source code, algorithms, trade secrets, and user data. Privacy concerns have prevented the SCA technology from being used in real-world scenarios. Therefore, academia and the industry demand privacy-preserving SCA solutions. For the first time, we analyze the privacy requirements of SCA and provide a landscape depicting possible technical solutions with varying privacy gains and overheads. In particular, given that de facto SCA frameworks are primarily driven by code similarity-based techniques, we explore combining several privacy-preserving protocols to encapsulate the similarity-based SCA framework. Among all viable solutions, we find that multi-party computation (MPC) offers the strongest privacy guarantee and plausible accuracy; it, however, incurs high overhead (184 times). We optimize the MPC-based SCA framework by reducing the amount of crypto protocol transactions using program analysis techniques. The evaluation results show that our proposed optimizations can reduce the MPC-based SCA overhead to only 8.5% without sacrificing SCA's privacy guarantee or accuracy.
翻译:软件成分分析(SCA)是指识别输入软件应用中开源软件组件的过程。SCA已在学术界和工业界得到广泛发展和应用。然而我们注意到,由于隐私问题,工业场景中的现代SCA技术仍有待改进。总体而言,SCA需要用户将其应用程序的源代码上传至远程SCA服务器,服务器随后检测应用程序并向用户报告组件使用情况。该过程具有隐私敏感性,因为应用程序可能包含敏感信息,例如专有源代码、算法、商业机密和用户数据。隐私顾虑阻碍了SCA技术在实际场景中的应用。因此,学术界与工业界亟需隐私保护的SCA解决方案。本研究首次系统分析了SCA的隐私需求,并绘制了具有不同隐私增益与开销的技术解决方案全景图。鉴于实际SCA框架主要依赖基于代码相似性的技术,我们探索将多种隐私保护协议与基于相似性的SCA框架相结合。在所有可行方案中,我们发现多方计算(MPC)能提供最强的隐私保障与合理的准确性,但其产生的开销较高(达184倍)。我们通过程序分析技术减少加密协议交互量,从而优化了基于MPC的SCA框架。评估结果表明,所提出的优化方案可将基于MPC的SCA开销降低至仅8.5%,同时不损害SCA的隐私保障与准确性。