TY - GEN
T1 - LLMonCAR
T2 - 21st International Conference on Intelligent Computing, ICIC 2025
AU - Hu, Hongzhen
AU - Li, Yifan
AU - Wang, Siyu
AU - Wang, Gaoli
AU - Hu, Jianyong
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
PY - 2025
Y1 - 2025
N2 - The Cryptographic Algorithm Recognition (CAR) task is a critical problem in cryptography, with significant implications for the security of cryptographic algorithm design. While Large Language Models (LLMs) demonstrate promising potential in addressing this task, evaluating their performance remains a challenge due to the absence of aligned input-output specifications and standardized evaluation metrics for it in LLMs. In this paper, we construct an evaluation dataset and the corresponding metrics to analyze the performance and factors that influence effectiveness in CAR. The evaluation includes seven different cryptographic algorithms, along with performance of five main LLMs in this dataset. Experimental results indicate that LLMs exhibit limitations in algorithm identification, achieving an average accuracy of 63.9%. The performance is significantly influenced by the cryptographic algorithm and the fundamental capabilities of LLMs. Surprisingly, a mainstream cryptographic algorithm called Keccak can be relatively recognized by LLMs, which it shouldn’t be, unlike other modern algorithms. Furthermore, we introduce six different prompt engineering methods and find that most do not significantly enhance LLM performance in CAR. However, the prompting approach of snapshot-based exemplar reference effectively improves performance of CAR, resulting in an average increase of 7.7%, with varying degrees of improvement under different conditions.
AB - The Cryptographic Algorithm Recognition (CAR) task is a critical problem in cryptography, with significant implications for the security of cryptographic algorithm design. While Large Language Models (LLMs) demonstrate promising potential in addressing this task, evaluating their performance remains a challenge due to the absence of aligned input-output specifications and standardized evaluation metrics for it in LLMs. In this paper, we construct an evaluation dataset and the corresponding metrics to analyze the performance and factors that influence effectiveness in CAR. The evaluation includes seven different cryptographic algorithms, along with performance of five main LLMs in this dataset. Experimental results indicate that LLMs exhibit limitations in algorithm identification, achieving an average accuracy of 63.9%. The performance is significantly influenced by the cryptographic algorithm and the fundamental capabilities of LLMs. Surprisingly, a mainstream cryptographic algorithm called Keccak can be relatively recognized by LLMs, which it shouldn’t be, unlike other modern algorithms. Furthermore, we introduce six different prompt engineering methods and find that most do not significantly enhance LLM performance in CAR. However, the prompting approach of snapshot-based exemplar reference effectively improves performance of CAR, resulting in an average increase of 7.7%, with varying degrees of improvement under different conditions.
KW - Cryptographic Algorithm Identification
KW - Cryptographic Security
KW - Large Language Models
KW - Model Performance Evaluation
KW - Prompt Engineering
UR - https://www.scopus.com/pages/publications/105013060109
U2 - 10.1007/978-981-96-9911-7_40
DO - 10.1007/978-981-96-9911-7_40
M3 - 会议稿件
AN - SCOPUS:105013060109
SN - 9789819699100
T3 - Communications in Computer and Information Science
SP - 487
EP - 497
BT - Advanced Intelligent Computing Technology and Applications - 21st International Conference, ICIC 2025, Proceedings
A2 - Huang, De-Shuang
A2 - Zhang, Chuanlei
A2 - Zhang, Qinhu
A2 - Pan, Yijie
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 26 July 2025 through 29 July 2025
ER -