Abstract
Large-scale development and proliferation of open source software has constructed an ecosystem for open source development and collaboration. Within this system individuals and organizations collaboratively develop high-quality software that is accessible to all. Social collaboration platforms, represented by GitHub, have further facilitated large-scale, distributed, and fine-grained code collaboration and technical socialization. Countless developers submit code, review code, report bugs, or propose new feature requests on these platforms every day. This results in a vast amount of behavioral data from the fully open collaborative development process, which holds immense value. This paper designs and implements a one-stop data mining system for the open source collaboration digital ecosystem, named OpenDigger. Its goal is to build data infrastructure in the open source field and promote the continuous development of the open source ecosystem. OpenDigger system consists primarily of data collection module, storage module, tag data module, and information service module. It is built upon an OLAP columnar database and a graph database. The system continuously collects data from multiple sources within the open-source ecosystem and provides various types of open-source information services to different user groups through a unified interface. Additionally, OpenDigger mines key information from the open-source digital ecosystem through the perspective of collaborative relationship networks. Compared to traditional statistical indicators, the collaborative network perspective better illustrates the association characteristics between open-source projects and developers.
| Translated title of the contribution | Data Mining and Information Service for Open Collaboration Digital Ecosystem |
|---|---|
| Original language | Chinese (Traditional) |
| Pages (from-to) | 187-195 |
| Number of pages | 9 |
| Journal | Computer Science |
| Volume | 51 |
| Issue number | 10 |
| DOIs | |
| State | Published - 15 Oct 2024 |