TY - GEN
T1 - Book2QA
T2 - 2025 International Joint Conference on Neural Networks, IJCNN 2025
AU - Cui, Zhanhao
AU - Wang, Ye
AU - Huang, Xinya
AU - Wu, Wen
AU - Hu, Wenxin
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - The scarcity of high-quality question answering (QA) data remains a significant bottleneck in the development of intelligent educational systems, as traditional datasets lack the necessary scale and diversity to support personalized model training. To address this challenge, we propose the Book2QA framework, which integrates multiple language models to provide an effective and flexible approach for generating QA datasets derived from textbook content. To further enhance the quality of the generated data, we implement a hierarchical prompting strategy grounded in Bloom's taxonomy, substantially increasing both the depth and breadth of the QA datasets. In addition, we fine-tune our model using these data, with evaluations conducted by both human reviewers and GPT-4 confirming its strong performance in real-world questioning scenarios. Experimental results demonstrate that our framework excels not only in specific textbook domains but also shows promise for broader applications across diverse fields. We open source our data and code at https://github.com/Curtain2020/Book2QA.
AB - The scarcity of high-quality question answering (QA) data remains a significant bottleneck in the development of intelligent educational systems, as traditional datasets lack the necessary scale and diversity to support personalized model training. To address this challenge, we propose the Book2QA framework, which integrates multiple language models to provide an effective and flexible approach for generating QA datasets derived from textbook content. To further enhance the quality of the generated data, we implement a hierarchical prompting strategy grounded in Bloom's taxonomy, substantially increasing both the depth and breadth of the QA datasets. In addition, we fine-tune our model using these data, with evaluations conducted by both human reviewers and GPT-4 confirming its strong performance in real-world questioning scenarios. Experimental results demonstrate that our framework excels not only in specific textbook domains but also shows promise for broader applications across diverse fields. We open source our data and code at https://github.com/Curtain2020/Book2QA.
KW - Bloom's Taxonomy
KW - Data Synthesis
KW - Educational Chatbots
KW - Language Model Integration
UR - https://www.scopus.com/pages/publications/105023990137
U2 - 10.1109/IJCNN64981.2025.11227625
DO - 10.1109/IJCNN64981.2025.11227625
M3 - 会议稿件
AN - SCOPUS:105023990137
T3 - Proceedings of the International Joint Conference on Neural Networks
BT - International Joint Conference on Neural Networks, IJCNN 2025 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 30 June 2025 through 5 July 2025
ER -