Industrial applications heavily rely on open-source software (OSS) libraries, which provide various benefits. But, they can also present a substantial risk if a vulnerability or attack arises and the community fails to promptly address the issue and release a fix due to inactivity. To be able to monitor the activities of such communities, a comprehensive list of repositories for the libraries of an ecosystem must be accessible. Based on these repositories, integrated libraries of an application can be monitored to observe whether they are adequately maintained. In this descriptive study, we analyze the accessibility of GitHub repositories for PyPI and NPM libraries. For all available libraries, we extract assigned repository URLs, direct dependencies and use the page rank algorithm to comprehensively analyze the ecosystems from a library and dependency chain perspective. For invalid repository URLs, we derive potential reasons. Both ecosystems show varying accessibility to GitHub repository URLs, depending on the page rank score of the analyzed libraries. For individual libraries, up to 73.8% of PyPI and up to 69.4% of NPM libraries have repository URLs. Within dependency chains, up to 80.1% of PyPI libraries have URLs, while up to 81.1% for NPM. That means, most libraries, especially the ones of increasing importance, can be monitored on GitHub. Among the most common reasons for invalid repository URLs is no URLs being assigned at all, which amounts up to 17.9% for PyPI and up to 39.6% for NPM. Package maintainers should address this issue and update the repository information to enable monitoring of their libraries.
翻译:工业应用高度依赖开源软件(OSS)库,这些库提供了诸多益处。然而,若出现漏洞或攻击事件,且社区因活跃度不足未能及时响应并发布修复,此类库也可能带来重大风险。为监控相关社区的活动,必须获取生态系统中各库的完整仓库列表。基于这些仓库,可对应用程序集成的库进行监测,以观察其维护是否充分。在本描述性研究中,我们分析了PyPI与NPM库的GitHub仓库可访问性。针对所有可用库,我们提取了分配的仓库URL、直接依赖关系,并采用PageRank算法从库及依赖链视角对生态系统进行综合分析。针对无效的仓库URL,我们推导了其潜在成因。两个生态系统在GitHub仓库URL的可访问性上呈现出差异,该差异与分析库的PageRank分数相关。就单个库而言,PyPI库中最多73.8%具备仓库URL,NPM库中该比例最高为69.4%;在依赖链中,PyPI库URL覆盖率达80.1%,NPM库为81.1%。这意味着大多数库(尤其是重要性递增的库)可在GitHub上被监控。无效仓库URL的最常见原因为根本未分配URL,该比例在PyPI中达17.9%,在NPM中达39.6%。包维护者应解决此问题并更新仓库信息,以支持对其库的监控。