TY - JOUR
T1 - Fault-tolerant real-time tasks scheduling with dynamic fault handling
AU - Chen, Gang
AU - Guan, Nan
AU - Huang, Kai
AU - Yi, Wang
N1 - Publisher Copyright:
© 2019
PY - 2020/1
Y1 - 2020/1
N2 - Predictable performance when coping with transient failures is of paramount importance in safety-critical real-time systems. Various software fault-tolerant techniques are employed towards this goal among which check-pointing is a relatively cost-effective scheme. In this paper, we propose an efficient fault-tolerant scheduling framework with run-time fault handling protocol, where criticality levels can be adaptively inserted for fault handling according to run-time fault workload. In contrast to prior works which apply with task re-execution strategy, the proposed framework adaptively determines on-demand re-executions only on the faulty checkpoint segments, rather than on the whole job. Towards this, a unified overrun handling protocol is developed to handle fault recovery adaptively to avoid over-provisioning of resources. In addition, we develop an off-line schedulability analysis technique for the proposed scheduling algorithm. The simulation results show that our fault-tolerant scheduling framework can bring up to 81% improvement in supporting low-criticality service without sacrifice in the MC-schedulability compared with the existing techniques.
AB - Predictable performance when coping with transient failures is of paramount importance in safety-critical real-time systems. Various software fault-tolerant techniques are employed towards this goal among which check-pointing is a relatively cost-effective scheme. In this paper, we propose an efficient fault-tolerant scheduling framework with run-time fault handling protocol, where criticality levels can be adaptively inserted for fault handling according to run-time fault workload. In contrast to prior works which apply with task re-execution strategy, the proposed framework adaptively determines on-demand re-executions only on the faulty checkpoint segments, rather than on the whole job. Towards this, a unified overrun handling protocol is developed to handle fault recovery adaptively to avoid over-provisioning of resources. In addition, we develop an off-line schedulability analysis technique for the proposed scheduling algorithm. The simulation results show that our fault-tolerant scheduling framework can bring up to 81% improvement in supporting low-criticality service without sacrifice in the MC-schedulability compared with the existing techniques.
KW - Check-pointing
KW - Fault-tolerant scheduling
KW - Run-time fault handling
KW - Safety-critical real-time system
UR - https://www.scopus.com/pages/publications/85076256918
U2 - 10.1016/j.sysarc.2019.101688
DO - 10.1016/j.sysarc.2019.101688
M3 - 文章
AN - SCOPUS:85076256918
SN - 1383-7621
VL - 102
JO - Journal of Systems Architecture
JF - Journal of Systems Architecture
M1 - 101688
ER -