TY - GEN
T1 - AID-SQL
T2 - 41st IEEE International Conference on Data Engineering, ICDE 2025
AU - Li, Xiuwen
AU - Cai, Qifeng
AU - Shu, Yang
AU - Guo, Chenjuan
AU - Yang, Bin
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Recent research in Text-to-SQL translation has primarily adopted in-context learning methods leveraging large language models (LLMs), achieving significant progress. However, these methods face challenges in adapting to natural language questions of varying difficulty and the relevance of the few-shot examples provided. In this paper, we propose an adaptive in-context learning approach with difficulty-aware instruction and retrieval-augmented generation to enhance the performance of Text-to-SQL translation (AID-SQL). First, we introduce adaptive instructions for LLMs, which employ precise difficulty classification to apply difficulty-adaptive generative guidelines and chain of thought (CoT) templates for varying difficulty levels. We automatically incorporate few-shot examples retrieved through the knowledge base into the CoT template to construct CoT-enhanced examples, which improves the capability of LLMs with retrieval-augmented generation (RAG). Furthermore, considering that current RAG methods struggle to effectively measure the contribution of retrieved examples in solving the specific task of Text-to-SQL translation, we train a ranking model that can better bridge the semantic and structural gap between NL questions and SQL queries. This approach can better understand semantic information and allows for retrieving examples that are more beneficial to the final problem-solving. We evaluate our method on five benchmarks. Our method achieves competitive performance compared with existing methods.
AB - Recent research in Text-to-SQL translation has primarily adopted in-context learning methods leveraging large language models (LLMs), achieving significant progress. However, these methods face challenges in adapting to natural language questions of varying difficulty and the relevance of the few-shot examples provided. In this paper, we propose an adaptive in-context learning approach with difficulty-aware instruction and retrieval-augmented generation to enhance the performance of Text-to-SQL translation (AID-SQL). First, we introduce adaptive instructions for LLMs, which employ precise difficulty classification to apply difficulty-adaptive generative guidelines and chain of thought (CoT) templates for varying difficulty levels. We automatically incorporate few-shot examples retrieved through the knowledge base into the CoT template to construct CoT-enhanced examples, which improves the capability of LLMs with retrieval-augmented generation (RAG). Furthermore, considering that current RAG methods struggle to effectively measure the contribution of retrieved examples in solving the specific task of Text-to-SQL translation, we train a ranking model that can better bridge the semantic and structural gap between NL questions and SQL queries. This approach can better understand semantic information and allows for retrieving examples that are more beneficial to the final problem-solving. We evaluate our method on five benchmarks. Our method achieves competitive performance compared with existing methods.
KW - Large Language Model
KW - SQL
KW - Text-to-SQL
UR - https://www.scopus.com/pages/publications/105015427099
U2 - 10.1109/ICDE65448.2025.00294
DO - 10.1109/ICDE65448.2025.00294
M3 - 会议稿件
AN - SCOPUS:105015427099
T3 - Proceedings - International Conference on Data Engineering
SP - 3945
EP - 3957
BT - Proceedings - 2025 IEEE 41st International Conference on Data Engineering, ICDE 2025
PB - IEEE Computer Society
Y2 - 19 May 2025 through 23 May 2025
ER -