Accurate and efficient follower log repair for Raft-replicated database systems

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

State machine replication has been widely used in modern cluster-based database systems. Most commonly deployed configurations adopt the Raft-like consensus protocol, which has a single strong leader which replicates the log to other followers. Since the followers can handle read requests and many real workloads are usually read-intensive, the recovery speed of a crashed follower may significantly impact on the throughput. Different from traditional database recovery, the recovering follower needs to repair its local log first. Original Raft protocol takes many network round trips to do log comparison between leader and the crashed follower. To reduce network round trips, an optimization method is to truncate the follower’s uncertain log entries behind the latest local commit point, and then to directly fetch all committed log entries from the leader in one round trip. However, if the commit point is not persisted, the recovering follower has to get the whole log from the leader. In this paper, we propose an accurate and efficient log repair (AELR) algorithm for follower recovery. AELRis more robust and resilient to follower failure, and it only needs one network round trip to fetch the least number of log entries for follower recovery. This approach is implemented in the open source database system OceanBase. We experimentally show that the system adopting AELR has a good performance in terms of recovery time.

Original languageEnglish
Article number152605
JournalFrontiers of Computer Science
Volume15
Issue number2
DOIs
StatePublished - Apr 2021

Keywords

  • Raft
  • high availability
  • log repair
  • log replication

Fingerprint

Dive into the research topics of 'Accurate and efficient follower log repair for Raft-replicated database systems'. Together they form a unique fingerprint.

Cite this