Android app developers extensively employ code reuse, integrating many third-party libraries into their apps. While such integration is practical for developers, it can be challenging for static analyzers to achieve scalability and precision when libraries account for a large part of the code. As a direct consequence, it is common practice in the literature to consider developer code only during static analysis --with the assumption that the sought issues are in developer code rather than the libraries. However, analysts need to distinguish between library and developer code. Currently, many static analyses rely on white lists of libraries. However, these white lists are unreliable, inaccurate, and largely non-comprehensive. In this paper, we propose a new approach to address the lack of comprehensive and automated solutions for the production of accurate and ``always up to date" sets of libraries. First, we demonstrate the continued need for a white list of libraries. Second, we propose an automated approach to produce an accurate and up-to-date set of third-party libraries in the form of a dataset called AndroLibZoo. Our dataset, which we make available to the community, contains to date 34 813 libraries and is meant to evolve.
翻译:安卓应用开发者广泛采用代码复用,将大量第三方库集成到其应用中。虽然这种集成对开发者而言十分实用,但当库代码占据应用大部分代码时,静态分析器在实现可扩展性和精确性方面面临挑战。因此,文献中普遍做法是在静态分析中仅考虑开发者编写的代码——基于一个假设:待分析的问题存在于开发者代码中而非库代码中。然而,分析人员仍需区分库代码与开发者代码。当前,许多静态分析依赖于库的白名单。但这些白名单存在不可靠、不准确且覆盖不全的问题。本文提出一种新方法,以解决缺乏全面自动化方案来生成精确且"时刻更新"的库集合的问题。首先,我们论证了库白名单的持续必要性;其次,我们提出一种自动化方法,生成精确且最新的第三方库数据集,名为AndroLibZoo。该数据集已向社区开放,目前包含34,813个库,并将持续演进。