TY - JOUR
T1 - FlexInstru
T2 - A flexible instrumentation framework for tracing long-running native workloads
AU - Mu, Wenlong
AU - Li, Ning
AU - Ji, Zimo
AU - Guo, Jianmei
AU - Huang, Bo
N1 - Publisher Copyright:
© 2025 Elsevier Inc.
PY - 2026/4
Y1 - 2026/4
N2 - Understanding program runtime characteristics is crucial for tasks such as optimization and workload characterization. For long-running server-side workloads that execute as native binaries, effective profiling is essential to trace their complex runtime behaviors, enabling further optimizations to improve the reliability and efficiency of the delivered services. Widely adopted techniques for profiling these workloads include binary instrumentation and hardware-based profiling. Binary instrumentation is typically accurate but incurs high overhead and lacks flexibility for tracing long-running native workloads. Hardware-based profiling brings low overhead while requiring hardware support. To overcome these limitations, we present FlexInstru, a hardware-independent dynamic instrumentation framework based on the process attachment/detachment mechanism. FlexInstru can flexibly instrument a native application at any time and for any duration when the application is running, and achieve a good balance between instrumentation accuracy and overhead, which makes it particularly effective in tracing long-running native workloads. FlexInstru provides a process attachment/detachment mechanism on Linux, allowing attaching an instrumentation engine to a long-running native workload and detaching it at any time. To mitigate overhead, FlexInstru also enables flexible control of instrumentation through multiple attachments/detachments, allowing the workload to alternate between instrumented execution and native execution. Moreover, during instrumented execution, FlexInstru supports a sampling mechanism to collect data only during the sampling period, further reducing the overhead. We evaluate FlexInstru on AArch64 and X86-64 using real-world workloads. For MySQL's branch recording tasks, FlexInstru substantially reduces instrumentation overhead, with reductions of 415.60 × on AArch64 and 1223.02 × on X86-64 compared to traditional dynamic instrumentation, while maintaining sufficient accuracy.
AB - Understanding program runtime characteristics is crucial for tasks such as optimization and workload characterization. For long-running server-side workloads that execute as native binaries, effective profiling is essential to trace their complex runtime behaviors, enabling further optimizations to improve the reliability and efficiency of the delivered services. Widely adopted techniques for profiling these workloads include binary instrumentation and hardware-based profiling. Binary instrumentation is typically accurate but incurs high overhead and lacks flexibility for tracing long-running native workloads. Hardware-based profiling brings low overhead while requiring hardware support. To overcome these limitations, we present FlexInstru, a hardware-independent dynamic instrumentation framework based on the process attachment/detachment mechanism. FlexInstru can flexibly instrument a native application at any time and for any duration when the application is running, and achieve a good balance between instrumentation accuracy and overhead, which makes it particularly effective in tracing long-running native workloads. FlexInstru provides a process attachment/detachment mechanism on Linux, allowing attaching an instrumentation engine to a long-running native workload and detaching it at any time. To mitigate overhead, FlexInstru also enables flexible control of instrumentation through multiple attachments/detachments, allowing the workload to alternate between instrumented execution and native execution. Moreover, during instrumented execution, FlexInstru supports a sampling mechanism to collect data only during the sampling period, further reducing the overhead. We evaluate FlexInstru on AArch64 and X86-64 using real-world workloads. For MySQL's branch recording tasks, FlexInstru substantially reduces instrumentation overhead, with reductions of 415.60 × on AArch64 and 1223.02 × on X86-64 compared to traditional dynamic instrumentation, while maintaining sufficient accuracy.
KW - Binary instrumentation
KW - BOLT optimization
KW - Profiling
KW - Tracing
UR - https://www.scopus.com/pages/publications/105024433446
U2 - 10.1016/j.jss.2025.112739
DO - 10.1016/j.jss.2025.112739
M3 - 文章
AN - SCOPUS:105024433446
SN - 0164-1212
VL - 234
JO - Journal of Systems and Software
JF - Journal of Systems and Software
M1 - 112739
ER -