TY - GEN
T1 - FALCON
T2 - 2025 IEEE International Conference on Multimedia and Expo, ICME 2025
AU - Li, Zeyuan
AU - He, Yangfan
AU - He, Lewei
AU - Wang, Jianhui
AU - Shi, Tianyu
AU - Lei, Bin
AU - Li, Yuchen
AU - Chen, Qiuwu
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Recently, large language models (LLMs) have achieved significant progress in automated code generation. Despite their strong instruction-following capabilities, these models frequently struggled to align with user intent in the coding scenario. In particular, they were hampered by datasets that lacked diversity and failed to address specialized tasks or edge cases. Furthermore, challenges in supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) led to failures in generating precise, human-intent-aligned code. To tackle these challenges and improve the code generation performance for automated programming systems, we propose Feedback-driven Adaptive Long/short-term memory reinforced Coding OptimizatioN (i.e., FALCON). FALCON leverages long-term memory to retain and apply learned knowledge, short-term memory to incorporate immediate feedback, and meta-reinforcement learning with feedback rewards to address global-local bi-level optimization and enhance adaptability across diverse code generation tasks. Extensive experiments show that FALCON achieves state-of-the-art performance, outperforming other reinforcement learning methods by over 4.5% on MBPP and 6.1% on Humaneval, with the code publicly available. https://anonymous.4open.science/r/FALCON-3B64/README.md.
AB - Recently, large language models (LLMs) have achieved significant progress in automated code generation. Despite their strong instruction-following capabilities, these models frequently struggled to align with user intent in the coding scenario. In particular, they were hampered by datasets that lacked diversity and failed to address specialized tasks or edge cases. Furthermore, challenges in supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) led to failures in generating precise, human-intent-aligned code. To tackle these challenges and improve the code generation performance for automated programming systems, we propose Feedback-driven Adaptive Long/short-term memory reinforced Coding OptimizatioN (i.e., FALCON). FALCON leverages long-term memory to retain and apply learned knowledge, short-term memory to incorporate immediate feedback, and meta-reinforcement learning with feedback rewards to address global-local bi-level optimization and enhance adaptability across diverse code generation tasks. Extensive experiments show that FALCON achieves state-of-the-art performance, outperforming other reinforcement learning methods by over 4.5% on MBPP and 6.1% on Humaneval, with the code publicly available. https://anonymous.4open.science/r/FALCON-3B64/README.md.
KW - Code generation
KW - Diverse Feedback
KW - Reinforcement Learning
UR - https://www.scopus.com/pages/publications/105022624419
U2 - 10.1109/ICME59968.2025.11208959
DO - 10.1109/ICME59968.2025.11208959
M3 - 会议稿件
AN - SCOPUS:105022624419
T3 - Proceedings - IEEE International Conference on Multimedia and Expo
BT - 2025 IEEE International Conference on Multimedia and Expo
PB - IEEE Computer Society
Y2 - 30 June 2025 through 4 July 2025
ER -