TY - GEN
T1 - Learning from Failures
T2 - 24th IEEE International Conference on Software Quality, Reliability and Security, QRS 2024
AU - Xu, Yilongfei
AU - Feng, Jincao
AU - Miao, Weikai
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Formalization of intended requirements is indispensable when using formal methods in software development. However, translating Natural Language (NL) requirements into formal specifications, such as Linear Temporal Logic (LTL), is error-prone. Although Large Language Models (LLMs) offer the potential for automatically translating unstructured NL requirements to LTL formulas, general-purpose LLMs face two major problems: First, low accuracy in translation. Second, high cost of model training and tuning. To tackle these challenges, we propose a new approach that combines dynamic prompt generation with human-computer interaction to leverage LLM for an accurate and efficient translation of unstructured NL requirements to LTL formulas. Our approach consists of two techniques: 1) Dynamic Prompt Generation, which automatically generates the most appropriate prompts for translating the inquired NL requirements. 2) Interactive Prompt Evolution, which helps LLMs to learn from previous translation errors, i.e., erroneous formalizations are amended by users and added as new prompt fragments. Our approach achieves remarkable performance in publicly available datasets from two distinct domains, comprising 36 and 255,000 NL-LTL pairs, respectively. Without human interaction, our method achieves up to 94.4% accuracy. When our approach is extended to another domain, the accuracy improves from an initial 27% to 78% under interactive prompt evolution.
AB - Formalization of intended requirements is indispensable when using formal methods in software development. However, translating Natural Language (NL) requirements into formal specifications, such as Linear Temporal Logic (LTL), is error-prone. Although Large Language Models (LLMs) offer the potential for automatically translating unstructured NL requirements to LTL formulas, general-purpose LLMs face two major problems: First, low accuracy in translation. Second, high cost of model training and tuning. To tackle these challenges, we propose a new approach that combines dynamic prompt generation with human-computer interaction to leverage LLM for an accurate and efficient translation of unstructured NL requirements to LTL formulas. Our approach consists of two techniques: 1) Dynamic Prompt Generation, which automatically generates the most appropriate prompts for translating the inquired NL requirements. 2) Interactive Prompt Evolution, which helps LLMs to learn from previous translation errors, i.e., erroneous formalizations are amended by users and added as new prompt fragments. Our approach achieves remarkable performance in publicly available datasets from two distinct domains, comprising 36 and 255,000 NL-LTL pairs, respectively. Without human interaction, our method achieves up to 94.4% accuracy. When our approach is extended to another domain, the accuracy improves from an initial 27% to 78% under interactive prompt evolution.
KW - Formal Specification
KW - Large Language Models
KW - Natural Language Processing
KW - Prompt Engineering
KW - Requirements Engineering
UR - https://www.scopus.com/pages/publications/85206388437
U2 - 10.1109/QRS62785.2024.00029
DO - 10.1109/QRS62785.2024.00029
M3 - 会议稿件
AN - SCOPUS:85206388437
T3 - IEEE International Conference on Software Quality, Reliability and Security, QRS
SP - 204
EP - 215
BT - Proceedings - 2024 IEEE 24th International Conference on Software Quality, Reliability and Security, QRS 2024
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 1 July 2024 through 5 July 2024
ER -