犗狆犲狀犘犲狉犳: 面向开源生态可持续发展的数据科学基准测试体系

Translated title of the contribution: OpenPerf: A Data Science Benchmark System for Open Source Ecosystem Sustainable Development

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Benchmarking refers to the quantitative and comparable evaluation of specific Performance metrics for a category of test subjects» achieved through scientifically designed test methods, tools, and Systems. With the advent of the artificial intelligence era, new AI bench-marking datasets, such as ImageNet and DataPerf, have gradually become consensus Standards in both academia and industry. Currently, research on the open-source ecosystem largely focuses on specific research points, lacking a comprehensive framework for open-source ecosystem bench-marks. Data consumers in the open-source domain urgently need foundational metrics and evalua-tions, such as a project's development stage, an enterprise's open-source program capabilities within the industry, developer activity, and project influence. To address the "data-rich but benchmark-poor" Situation in the open-source field, this paper proposes a data science benchmark System for the sustainable development of the open-source ecosystem, termed OpenPerf. This System adopts a bottom-up approach and primarily includes data science task-based, index-based, and benchmark-based categories, aiming to provide diverse benchmark references for academia and industry. This paper define nine task-based data science benchmarks, including open-source behavior data completion and prediction, automated open-source bot identification and Classification, sentiment Classification of open-source comment texts, risk prediction in open-source Software supply chains, open-source project influence ranking, prediction of archived projects, open-source network influence index prediction, anomaly detection in open-source communities, and open-source project recommendation based on link prediction. We present results for three representative task-based benchmarks (open-source behavior data completion and prediction, automated open-source bot identification and Classification, and open-source project recommendation based on link prediction), two index-based benchmarks (influence and activity), and one benchmark-based reference. Notably, two of the index-based benchmarks have been adopted by the China Electronics Standardization Institute as evaluation Standards for open-source Community governance. The task-based benchmarks are primarily aimed at academia, providing researchers in different fields with a relevant reference framework for their areas of expertise. The index-based benchmarks are designed for industry use, enabling enterprises to gain detailed insights through various index-based metrics; unlike task-based benchmarks, index benchmarks serve as standardized units that measure specific attributes of the test subjects. The benchmark-based reference represents an industry best-practice Performance level, setting a measurable Standard of excellence for specific fields. Through metrics such as influence and activity, enterprises can understand the current industry positioning of their open-source programs and the development stage of their open-source projects. The experimental results demonstrate that, compared to other existing metrics, OpenPerf more effectively evaluates the individual influence of open-source developers, thereby fostering open-source incentives. OpenPerf' s benchmark Service can analyze the development Status of multiple open-source projects within a group from a macro perspective, quantifying the activity and influence of different projects and providing valuable insights for the sustained, healthy growth of these projects. Moreover, in open-source Software courses, integrating OpenPerf s benchmark Service with traditional grading methods allows for a more comprehensive and fair assessment of students' contributions. The critical role of OpenPerf in promoting the sustainable development of the open-source ecosystem is further illustrated through practical cases from prominent domestic companies and institutions, including Alibaba, Ant Group, and East China Normal University.

Translated title of the contributionOpenPerf: A Data Science Benchmark System for Open Source Ecosystem Sustainable Development
Original languageChinese (Traditional)
Pages (from-to)632-649
Number of pages18
JournalJisuanji Xuebao/Chinese Journal of Computers
Volume48
Issue number3
DOIs
StatePublished - Mar 2025

Fingerprint

Dive into the research topics of 'OpenPerf: A Data Science Benchmark System for Open Source Ecosystem Sustainable Development'. Together they form a unique fingerprint.

Cite this