TY - GEN
T1 - StellarTop
T2 - 16th BenchCouncil International Symposium on Benchmarking, Measuring, and Optimizing, Bench 2024
AU - Zhu, Zhiwei
AU - Huang, Wenrui
AU - Wang, Wei
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
PY - 2025
Y1 - 2025
N2 - GitHub has become one of the most popular platforms for open source version control and collaboration. In 2017, GitHub introduced the “Topics” feature, which allows repository owners to add descriptive topics to better characterize their repositories. However, incorrect topic assignment can adversely impact a repository’s visibility to potential contributors. Compared to previous datasets, which delete low-frequency topics or map them to the frequent featured topics, our dataset retains all valuable topics, taking full account of the diversity of topics. In our work, we have collected the top 50,000 starred repositories on GitHub so far, along with their text information such as descriptions and README. Finally we collected information from 28,386 repositories with a total of 162,038 topics, covering 22,710 distinct topics. This extensive dataset supports various research applications, such as topic recommendation and trend analysis in open-source projects. Our dataset is available freely at https://github.com/Zzzzzhuzhiwei/StellarTop.
AB - GitHub has become one of the most popular platforms for open source version control and collaboration. In 2017, GitHub introduced the “Topics” feature, which allows repository owners to add descriptive topics to better characterize their repositories. However, incorrect topic assignment can adversely impact a repository’s visibility to potential contributors. Compared to previous datasets, which delete low-frequency topics or map them to the frequent featured topics, our dataset retains all valuable topics, taking full account of the diversity of topics. In our work, we have collected the top 50,000 starred repositories on GitHub so far, along with their text information such as descriptions and README. Finally we collected information from 28,386 repositories with a total of 162,038 topics, covering 22,710 distinct topics. This extensive dataset supports various research applications, such as topic recommendation and trend analysis in open-source projects. Our dataset is available freely at https://github.com/Zzzzzhuzhiwei/StellarTop.
KW - Dataset
KW - Github topics
KW - Open Source Projects
KW - Software Engineering
UR - https://www.scopus.com/pages/publications/105004255631
U2 - 10.1007/978-981-96-5032-3_7
DO - 10.1007/978-981-96-5032-3_7
M3 - 会议稿件
AN - SCOPUS:105004255631
SN - 9789819650316
T3 - Lecture Notes in Computer Science
SP - 113
EP - 126
BT - Benchmarking, Measuring, and Optimizing - 16th BenchCouncil International Symposium, Bench 2024, Revised Selected Papers
A2 - Lin, Weiwei
A2 - Jia, Zhen
A2 - Hunold, Sascha
A2 - Kang, Guoxin
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 4 December 2024 through 6 December 2024
ER -