Book2QA: A Framework for Integrating LLMs to Generate High-quality QA Data from Textbooks

  • Zhanhao Cui
  • , Ye Wang
  • , Xinya Huang
  • , Wen Wu
  • , Wenxin Hu*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The scarcity of high-quality question answering (QA) data remains a significant bottleneck in the development of intelligent educational systems, as traditional datasets lack the necessary scale and diversity to support personalized model training. To address this challenge, we propose the Book2QA framework, which integrates multiple language models to provide an effective and flexible approach for generating QA datasets derived from textbook content. To further enhance the quality of the generated data, we implement a hierarchical prompting strategy grounded in Bloom's taxonomy, substantially increasing both the depth and breadth of the QA datasets. In addition, we fine-tune our model using these data, with evaluations conducted by both human reviewers and GPT-4 confirming its strong performance in real-world questioning scenarios. Experimental results demonstrate that our framework excels not only in specific textbook domains but also shows promise for broader applications across diverse fields. We open source our data and code at https://github.com/Curtain2020/Book2QA.

Original languageEnglish
Title of host publicationInternational Joint Conference on Neural Networks, IJCNN 2025 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798331510428
DOIs
StatePublished - 2025
Event2025 International Joint Conference on Neural Networks, IJCNN 2025 - Rome, Italy
Duration: 30 Jun 20255 Jul 2025

Publication series

NameProceedings of the International Joint Conference on Neural Networks
ISSN (Print)2161-4393
ISSN (Electronic)2161-4407

Conference

Conference2025 International Joint Conference on Neural Networks, IJCNN 2025
Country/TerritoryItaly
CityRome
Period30/06/255/07/25

Keywords

  • Bloom's Taxonomy
  • Data Synthesis
  • Educational Chatbots
  • Language Model Integration

Fingerprint

Dive into the research topics of 'Book2QA: A Framework for Integrating LLMs to Generate High-quality QA Data from Textbooks'. Together they form a unique fingerprint.

Cite this