TextCenGen: Attention-Guided Text-Centric Background Adaptation for Text-to-Image Generation

  • Tianyi Liang
  • , Jiangqi Liu
  • , Yifei Huang
  • , Shiqi Jiang
  • , Jianshen Shi
  • , Changbo Wang
  • , Chenhui Li*
  • *Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

1 Scopus citations

Abstract

Text-to-image (T2I) generation has made remarkable progress in producing high-quality images, but a fundamental challenge remains: creating backgrounds that naturally accommodate text placement without compromising image quality. This capability is non-trivial for real-world applications like graphic design, where clear visual hierarchy between content and text is essential. Prior work has primarily focused on arranging layouts within existing static images, leaving unexplored the potential of T2I models for generating text-friendly backgrounds. We present TextCenGen, a training-free dynamic background adaptation in the blank region for text-friendly image generation. Instead of directly reducing attention in text areas, which degrades image quality, we relocate conflicting objects before background optimization. Our method analyzes cross-attention maps to identify conflicting objects overlapping with text regions and uses a force-directed graph approach to guide their relocation, followed by attention excluding constraints to ensure smooth backgrounds. Our method is plug-and-play, requiring no additional training while well balancing both semantic fidelity and visual quality. Evaluated on our proposed text-friendly T2I benchmark of 27,000 images across four seed datasets, TextCenGen outperforms existing methods by achieving 23% lower saliency overlap in text regions while maintaining 98% of the semantic fidelity measured by CLIP score and our proposed Visual-Textual Concordance Metric (VTCM).

Original languageEnglish
Pages (from-to)37237-37256
Number of pages20
JournalProceedings of Machine Learning Research
Volume267
StatePublished - 2025
Event42nd International Conference on Machine Learning, ICML 2025 - Vancouver, Canada
Duration: 13 Jul 202519 Jul 2025

Fingerprint

Dive into the research topics of 'TextCenGen: Attention-Guided Text-Centric Background Adaptation for Text-to-Image Generation'. Together they form a unique fingerprint.

Cite this