Sel3DCraft: Interactive Visual Prompts for User-Friendly Text-to-3D Generation

  • Nan Xiang*
  • , Tianyi Liang
  • , Haiwen Huang
  • , Shiqi Jiang
  • , Hao Huang
  • , Yifei Huang
  • , Liangyu Chen
  • , Changbo Wang
  • , Chenhui Li
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Text-to-3D (T23D) generation has transformed digital content creation, yet remains bottlenecked by blind trial-and-error prompting processes that yield unpredictable results. While visual prompt engineering has advanced in text-to-image domains, its application to 3D generation presents unique challenges requiring multi-view consistency evaluation and spatial understanding. We present Sel3DCraft, a visual prompt engineering system for T23D that transforms unstructured exploration into a guided visual process. Our approach introduces three key innovations: a dual-branch structure combining retrieval and generation for diverse candidate exploration; a multi-view hybrid scoring approach that leverages MLLMs with innovative high-level metrics to assess 3D models with human-expert consistency; and a prompt-driven visual analytics suite that enables intuitive defect identification and refinement. Extensive testing and a user study demonstrate that Sel3DCraft surpasses other T23D systems in supporting creativity for designers.

Original languageEnglish
JournalIEEE Transactions on Visualization and Computer Graphics
DOIs
StateAccepted/In press - 2025

Keywords

  • Prompt engineering
  • shape exploration
  • text-to-3D generation
  • visual perception
  • visualization design

Fingerprint

Dive into the research topics of 'Sel3DCraft: Interactive Visual Prompts for User-Friendly Text-to-3D Generation'. Together they form a unique fingerprint.

Cite this