The widespread development and adoption of open-source software have built an ecosystem for open development and collaboration. In this ecosystem, individuals and organizations collaborate to create high-quality software that can be used by everyone. Social collaboration platforms like GitHub have further facilitated large-scale, distributed, and fine-grained code collaboration and technical interactions. Countless developers contribute code, review code, report bugs, and propose new features on these platforms every day, generating a massive amount of valuable behavioral data from the open collaboration process. This paper presents the design and implementation of OpenDigger, a comprehensive data mining and information service system for open collaboration in the digital ecosystem. The goal is to build a data infrastructure for the open-source domain and promote the continuous development of the open-source ecosystem. The metrics and analysis models in the OpenDigger system can mine various knowledge from the macro to micro levels in the open-source digital ecosystem. Through a unified information service interface, OpenDigger provides various open-source information services to different user groups, including governments, enterprises, foundations, and individuals. As a novel information service system in the open-source ecosystem, this paper demonstrates the effectiveness of the metrics and models in OpenDigger through several real-world scenarios, including products, tools, applications, and courses. It showcases the significant and diverse practical applications of the metrics and models in both algorithmic and business aspects.
翻译:开源软件的广泛发展与普及构建了一个开放开发与协作的生态系统。在该生态系统中,个人与组织协同创造高质量软件,供所有人使用。GitHub等社会化协作平台进一步推动了大规模、分布式、细粒度的代码协作与技术交流。每天,无数开发者在这些平台上贡献代码、审查代码、报告缺陷并提出新功能,由此产生了海量的开放协作过程行为数据。本文提出了OpenDigger——面向数字生态中开放协作的综合性数据挖掘与信息服务系统的设计与实现,旨在构建开源领域的数据基础设施,促进开源生态的持续发展。OpenDigger系统中的指标与分析模型能够从宏观到微观层面挖掘开源数字生态系统中的多种知识。通过统一的信息服务接口,OpenDigger面向政府、企业、基金会及个人等不同用户群体提供多样化的开源信息服务。作为开源生态系统中的新型信息服务系统,本文通过产品、工具、应用及课程等多个真实场景验证了OpenDigger中指标与模型的有效性,展示了其指标与模型在算法及商业层面的重要且多元的实际应用价值。