TY - GEN
T1 - Exploring Activity and Contributors on GitHub
T2 - 29th Asia-Pacific Software Engineering Conference, APSEC 2022
AU - Xia, Xiaoya
AU - Weng, Zhenjie
AU - Wang, Wei
AU - Zhao, Shengyu
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Apart from being a code hosting platform, GitHub is the place where large-scale open collaborations and contributions happen. Every minute, thousands of developers are submitting code, having discussions of issues or pull requests, with all user behaviors recorded in the GitHub Event Stream (GES). Exploration of the activities in the GES could help understand who is active, the way they work, the time when they are active and even their location. To this end, a large-scale analysis was initially performed based on the 0.86 billion event records generated in 2020. We extracted 902K active contributors out of 14 million GitHub accounts by observing their activity distribution, then explored their behavior distribution, active time in the day and week, and estimated time zone distributions on the basis of their circadian activity rhythm. To go deeper, a case study of 79 projects in CNCF and contrast analyses of different project maturity levels were conducted. Our results showed that from a macro perspective, bots are increasingly more active and can serve numerous projects. Contributors work on weekdays, and are globally more inclined toward the daytime working hours in the Americas and Europe. The time zone distribution also reveals that UTC+2 and UTC-4 have the most active contributors. A critical discovery was the validation and quantification of a high bus factor risk exists in the OSS ecosystem. Whether from a large group point of view or within specific projects, a rather small group of OSS contributors (less than 20%) undertook the majority of the work. The GES can provide a wealth of information about open source software (OSS). Our findings provide insights into global GitHub collaboration behaviors and may be of help for researchers and practitioners to further understand modern OSS ecosystem.
AB - Apart from being a code hosting platform, GitHub is the place where large-scale open collaborations and contributions happen. Every minute, thousands of developers are submitting code, having discussions of issues or pull requests, with all user behaviors recorded in the GitHub Event Stream (GES). Exploration of the activities in the GES could help understand who is active, the way they work, the time when they are active and even their location. To this end, a large-scale analysis was initially performed based on the 0.86 billion event records generated in 2020. We extracted 902K active contributors out of 14 million GitHub accounts by observing their activity distribution, then explored their behavior distribution, active time in the day and week, and estimated time zone distributions on the basis of their circadian activity rhythm. To go deeper, a case study of 79 projects in CNCF and contrast analyses of different project maturity levels were conducted. Our results showed that from a macro perspective, bots are increasingly more active and can serve numerous projects. Contributors work on weekdays, and are globally more inclined toward the daytime working hours in the Americas and Europe. The time zone distribution also reveals that UTC+2 and UTC-4 have the most active contributors. A critical discovery was the validation and quantification of a high bus factor risk exists in the OSS ecosystem. Whether from a large group point of view or within specific projects, a rather small group of OSS contributors (less than 20%) undertook the majority of the work. The GES can provide a wealth of information about open source software (OSS). Our findings provide insights into global GitHub collaboration behaviors and may be of help for researchers and practitioners to further understand modern OSS ecosystem.
KW - Contributor Activity
KW - GitHub behavior data
KW - Open Source Collaboration
UR - https://www.scopus.com/pages/publications/85149167115
U2 - 10.1109/APSEC57359.2022.00013
DO - 10.1109/APSEC57359.2022.00013
M3 - 会议稿件
AN - SCOPUS:85149167115
T3 - Proceedings - Asia-Pacific Software Engineering Conference, APSEC
SP - 11
EP - 20
BT - Proceedings - 2022 29th Asia-Pacific Software Engineering Conference, APSEC 2022
PB - IEEE Computer Society
Y2 - 6 December 2022 through 9 December 2022
ER -