跳到主要导航 跳到搜索 跳到主要内容

TCSA: Efficient Localization of Busy-Wait Synchronization Bugs for Latency-Critical Applications

  • Ning Li
  • , Jianmei Guo*
  • , Bo Huang
  • , Yuyang Li
  • , Yilei Zhang
  • , Chengdong Li
  • , Wenxin Huang
  • *此作品的通讯作者
  • East China Normal University
  • Tencent

科研成果: 期刊稿件文章同行评审

摘要

Busy-wait synchronization is often used for latency-critical applications to ensure low latency. Unfortunately, its performance bugs due to thread contention may lead to request failures or even system crashes. Localizing the performance bugs of busy-wait synchronization is not trivial because we have to pinpoint the exact moment of occurrence from a relatively long measurement period and simultaneously identify candidate busy-wait threads from numerous concurrent threads. Existing methods often rely on hotspot-driven analysis of lock-related functions, but they still need extensive manual work to localize busy-wait threads. This paper proposes timing call stack analysis (TCSA), an efficient approach to localizing busy-wait synchronization bugs. The key idea is to time-serialize the function call stacks of applications and identify consecutive identical call stacks to catch busy-wait threads. TCSA can handle any application regardless of its programming language and identify various busy-wait patterns, including spinlocks, chaining spinlocks, futexes, and safepoint checks within the Java Virtual Machine. Compared to the state-of-the-art, TCSA can effectively diminish the quantity of examined records (e.g., threads and functions) by 1 to 3 orders of magnitude. TCSA has been deployed to a large cloud service provider, demonstrating its effectiveness, efficiency, and practicality in four real latency-critical applications.

源语言英语
页(从-至)297-309
页数13
期刊IEEE Transactions on Parallel and Distributed Systems
35
2
DOI
出版状态已出版 - 1 2月 2024

指纹

探究 'TCSA: Efficient Localization of Busy-Wait Synchronization Bugs for Latency-Critical Applications' 的科研主题。它们共同构成独一无二的指纹。

引用此