Exploring Activity and Contributors on GitHub: Who, What, When, and Where

  • Xiaoya Xia*
  • , Zhenjie Weng
  • , Wei Wang
  • , Shengyu Zhao
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

14 Scopus citations

Abstract

Apart from being a code hosting platform, GitHub is the place where large-scale open collaborations and contributions happen. Every minute, thousands of developers are submitting code, having discussions of issues or pull requests, with all user behaviors recorded in the GitHub Event Stream (GES). Exploration of the activities in the GES could help understand who is active, the way they work, the time when they are active and even their location. To this end, a large-scale analysis was initially performed based on the 0.86 billion event records generated in 2020. We extracted 902K active contributors out of 14 million GitHub accounts by observing their activity distribution, then explored their behavior distribution, active time in the day and week, and estimated time zone distributions on the basis of their circadian activity rhythm. To go deeper, a case study of 79 projects in CNCF and contrast analyses of different project maturity levels were conducted. Our results showed that from a macro perspective, bots are increasingly more active and can serve numerous projects. Contributors work on weekdays, and are globally more inclined toward the daytime working hours in the Americas and Europe. The time zone distribution also reveals that UTC+2 and UTC-4 have the most active contributors. A critical discovery was the validation and quantification of a high bus factor risk exists in the OSS ecosystem. Whether from a large group point of view or within specific projects, a rather small group of OSS contributors (less than 20%) undertook the majority of the work. The GES can provide a wealth of information about open source software (OSS). Our findings provide insights into global GitHub collaboration behaviors and may be of help for researchers and practitioners to further understand modern OSS ecosystem.

Original languageEnglish
Title of host publicationProceedings - 2022 29th Asia-Pacific Software Engineering Conference, APSEC 2022
PublisherIEEE Computer Society
Pages11-20
Number of pages10
ISBN (Electronic)9781665455374
DOIs
StatePublished - 2022
Event29th Asia-Pacific Software Engineering Conference, APSEC 2022 - Virtual, Online, Japan
Duration: 6 Dec 20229 Dec 2022

Publication series

NameProceedings - Asia-Pacific Software Engineering Conference, APSEC
Volume2022-December
ISSN (Print)1530-1362

Conference

Conference29th Asia-Pacific Software Engineering Conference, APSEC 2022
Country/TerritoryJapan
CityVirtual, Online
Period6/12/229/12/22

Keywords

  • Contributor Activity
  • GitHub behavior data
  • Open Source Collaboration

Fingerprint

Dive into the research topics of 'Exploring Activity and Contributors on GitHub: Who, What, When, and Where'. Together they form a unique fingerprint.

Cite this