Benchmarking automated GUI testing for Android against real-world bugs

Ting Su, Jue Wang, Zhendong Su

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

62 Scopus citations

Abstract

For ensuring the reliability of Android apps, there has been tremendous, continuous progress on improving automated GUI testing in the past decade. Specifically, dozens of testing techniques and tools have been developed and demonstrated to be effective in detecting crash bugs and outperform their respective prior work in the number of detected crashes. However, an overarching question "How effectively and thoroughly can these tools find crash bugs in practice?"has not been well-explored, which requires a ground-truth benchmark with real-world bugs. Since prior studies focus on tool comparisons w.r.t. some selected apps, they cannot provide direct, in-depth answers to this question. To complement existing work and tackle the above question, this paper offers the first ground-truth empirical evaluation of automated GUI testing for Android. To this end, we devote substantial manual effort to set up the Themis benchmark set, including (1) a carefully constructed dataset with 52 real, reproducible crash bugs (taking two person-months for its collection and validation), and (2) a unified, extensible infrastructure with six recent state-of-the-art testing tools. The whole evaluation has taken over 10,920 CPU hours. We find a considerable gap in these tools finding the collected real bugs-18 bugs cannot be detected by any tool. Our systematic analysis further identifies five major common challenges that these tools face, and reveals additional findings such as factors affecting these tools in bug finding and opportunities for tool improvements. Overall, this work offers new concrete insights, most of which are previously unknown/unstated and difficult to obtain. Our study presents a new, complementary perspective from prior studies to understand and analyze the effectiveness of existing testing tools, as well as a benchmark for future research on this topic. The Themis benchmark is publicly available at https://github.com/the-themis-benchmarks/home.

Original languageEnglish
Title of host publicationESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering
EditorsDiomidis Spinellis
PublisherAssociation for Computing Machinery, Inc
Pages119-130
Number of pages12
ISBN (Electronic)9781450385626
DOIs
StatePublished - 20 Aug 2021
Event29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2021 - Virtual, Online, Greece
Duration: 23 Aug 202128 Aug 2021

Publication series

NameESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Conference

Conference29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2021
Country/TerritoryGreece
CityVirtual, Online
Period23/08/2128/08/21

Keywords

  • Android apps
  • Benchmarking
  • Crash bugs
  • GUI testing

Fingerprint

Dive into the research topics of 'Benchmarking automated GUI testing for Android against real-world bugs'. Together they form a unique fingerprint.

Cite this