跳到主要导航 跳到搜索 跳到主要内容

StellarTop: An Integrated Multi-topic Dataset on GitHub Repositories

  • Zhiwei Zhu
  • , Wenrui Huang
  • , Wei Wang*
  • *此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

GitHub has become one of the most popular platforms for open source version control and collaboration. In 2017, GitHub introduced the “Topics” feature, which allows repository owners to add descriptive topics to better characterize their repositories. However, incorrect topic assignment can adversely impact a repository’s visibility to potential contributors. Compared to previous datasets, which delete low-frequency topics or map them to the frequent featured topics, our dataset retains all valuable topics, taking full account of the diversity of topics. In our work, we have collected the top 50,000 starred repositories on GitHub so far, along with their text information such as descriptions and README. Finally we collected information from 28,386 repositories with a total of 162,038 topics, covering 22,710 distinct topics. This extensive dataset supports various research applications, such as topic recommendation and trend analysis in open-source projects. Our dataset is available freely at https://github.com/Zzzzzhuzhiwei/StellarTop.

源语言英语
主期刊名Benchmarking, Measuring, and Optimizing - 16th BenchCouncil International Symposium, Bench 2024, Revised Selected Papers
编辑Weiwei Lin, Zhen Jia, Sascha Hunold, Guoxin Kang
出版商Springer Science and Business Media Deutschland GmbH
113-126
页数14
ISBN(印刷版)9789819650316
DOI
出版状态已出版 - 2025
活动16th BenchCouncil International Symposium on Benchmarking, Measuring, and Optimizing, Bench 2024 - Guangzhou, 中国
期限: 4 12月 20246 12月 2024

出版系列

姓名Lecture Notes in Computer Science
15519 LNCS
ISSN(印刷版)0302-9743
ISSN(电子版)1611-3349

会议

会议16th BenchCouncil International Symposium on Benchmarking, Measuring, and Optimizing, Bench 2024
国家/地区中国
Guangzhou
时期4/12/246/12/24

指纹

探究 'StellarTop: An Integrated Multi-topic Dataset on GitHub Repositories' 的科研主题。它们共同构成独一无二的指纹。

引用此