Abstract
Caching is a commonly used technique in databases to enhance query performance. Currently, existing database caching primarily falls into two directions: result caching and block caching. Result caching involves utilizing the final or intermediate results (such as subqueries) obtained during the execution of database queries, while block caching stores the underlying data blocks involved in the queries. This paper takes a different perspective, focusing on the 'computational load' contained within the cache, to reexamine the application of caching in query optimization. Building on this, the paper further classifies database caching methods. During query execution, a database query is transformed into a collection of operations (such as selection, sorting, etc.), with each operation corresponding to an operator. The data output by operators during query processing forms intermediate results, containing a portion of the computational load. We cache and leverage this subset of data. This caching approach, which cachcs a portion of the computational load, is termed 'operator caching', specifically caching the results of each operation execution. Due to the potential presence of common operators across different queries, performing similar computations on similar data, leveraging operator caching holds significant potential for accelerating query execution performance. The novelty of this paper lies in its exploration of how operator caching can be applied in query optimization, viewed from the perspective of the computational load contained in the cachc. Using Filter and Sort operators as examples, we propose and investigate how operator caching can be employed in query optimization. For cachc reuse. we introduce a semantic tree-based matching algorithm designed to efficiently match result sets in the cache. Simultaneously, to address the potential degradation of query performance causcd by reusing the cache, we suggest the use of a cost-based optimizer to prevent the degradation of query performance. Finally, based on the open-source analytical database ClickHouse, this paper implements a prototype of the Filter and Sort operator caching and conducts extensive experimental testing of the proposed operator caching scheme. The results indicate that, compared to block caching and materialized view approaches, the operator caching solution proposed in this paper can achieve a maximum improvement of 9 times and 1. 5 times in query response speed when deployed on a local SSD. When deployed in a cloud environment, the operator caching approach can achieve respective improvements of 30 times and 2 times in query response speed.
| Translated title of the contribution | Design and Implementation of Database Operator Cache for Select and Sort |
|---|---|
| Original language | Chinese (Traditional) |
| Pages (from-to) | 2084-2103 |
| Number of pages | 20 |
| Journal | Jisuanji Xuebao/Chinese Journal of Computers |
| Volume | 47 |
| Issue number | 9 |
| DOIs | |
| State | Published - Sep 2024 |