Metadata of libraries on the Python Package Index (PyPI)-including links to source code repositories and donation platforms-plays a critical role in supporting the transparency, trust, and sustainability of open-source libraries. Yet, many packages lack such metadata, and little is known about the underlying reasons. This paper presents a large-scale empirical study combining two targeted surveys sent to 50,000 PyPI authors and maintainers. We analyze more than 1,400 responses using large language model (LLM)-based topic modeling to uncover key motivations and barriers related to linking repositories and donation platforms. While repository URLs are often linked to foster collaboration, increase transparency, and enable issue tracking, some maintainers omit them due to oversight, laziness, or the perceived irrelevance to their project. Donation platform links are reported to support open source work or receive financial contributions, but are hindered by skepticism, technical friction, and organizational constraints. Cross-cutting challenges-such as outdated links, lack of awareness, and unclear guidance-affect both types of metadata. We further assess the robustness of our topic modeling pipeline across 30 runs (84% lexical and 89% semantic similarity) and validate topic quality with 23 expert raters (Randolph's kappa = 0.55). The study contributes empirical insights into PyPI's metadata practices and provides recommendations for improving them, while also demonstrating the effectiveness of our topic modeling approach for analyzing short-text survey responses.
翻译:Python软件包索引(PyPI)中库的元数据——包括指向源代码仓库和捐赠平台的链接——对支持开源库的透明度、可信度与可持续性发挥着关键作用。然而,许多软件包缺乏此类元数据,且其背后的原因尚不明确。本文通过向50,000名PyPI作者与维护者发送的两项定向调查,开展了一项大规模实证研究。我们基于大型语言模型(LLM)的主题建模方法分析了超过1,400份回复,以揭示与链接仓库和捐赠平台相关的关键动机与障碍。虽然仓库URL常被链接以促进协作、增强透明度并支持问题追踪,但部分维护者因疏忽、惰性或认为其项目无需此类链接而选择省略。捐赠平台链接被报告用于支持开源工作或接收财务捐助,但受到怀疑态度、技术摩擦与组织限制的阻碍。共通的挑战——如链接过时、意识缺乏与指引不清——同时影响着两类元数据。我们进一步评估了主题建模流程在30次运行中的稳健性(词汇相似度84%,语义相似度89%),并通过23位专家评分者验证了主题质量(Randolph's kappa = 0.55)。本研究为PyPI元数据实践提供了实证见解,并提出了改进建议,同时验证了我们基于主题建模的方法在分析短文本调查回复中的有效性。