Android app developers extensively employ code reuse, integrating many third-party libraries into their apps. While such integration is practical for developers, it can be challenging for static analyzers to achieve scalability and precision when libraries account for a large part of the code. As a direct consequence, it is common practice in the literature to consider developer code only during static analysis --with the assumption that the sought issues are in developer code rather than the libraries. However, analysts need to distinguish between library and developer code. Currently, many static analyses rely on white lists of libraries. However, these white lists are unreliable, inaccurate, and largely non-comprehensive. In this paper, we propose a new approach to address the lack of comprehensive and automated solutions for the production of accurate and ``always up to date" sets of libraries. First, we demonstrate the continued need for a white list of libraries. Second, we propose an automated approach to produce an accurate and up-to-date set of third-party libraries in the form of a dataset called AndroLibZoo. Our dataset, which we make available to the community, contains to date 34 813 libraries and is meant to evolve.
翻译:安卓应用开发者广泛采用代码复用技术,将大量第三方库集成至应用程序中。虽然这种集成方式对开发者具有实用价值,但当库代码占据较大比例时,静态分析工具在可扩展性和精确性方面面临挑战。因此,文献中的普遍做法是在静态分析期间仅考虑开发者代码——这基于"待解决问题存在于开发者代码而非库代码中"的假设。然而,分析人员仍需区分库代码与开发者代码。当前许多静态分析方法依赖于库白名单,但这些白名单存在不可靠、不准确且覆盖不全的问题。本文提出一种新方法,以解决当前缺乏自动化解决方案生成准确且"始终最新"库集合的难题。首先,我们论证了库白名单的持续必要性。其次,提出一种自动化方法,生成名为AndroLibZoo的准确且实时更新的第三方库数据集。该数据集包含迄今收集的34,813个库,并支持持续演进,现已在开放社区中发布。