MVESF: Multi-View Enhanced Semantic Fusion for Controllable Text-to-Image Generation

Hai Zhang, Guitao Cao*, Xinke Wang, Jiahao Quan

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Text-to-image(T2I) synthesis aims to generate semantically consistent images with texts. Currently, existing methods merely use the shallow semantics of the text to crudely guide image generation. They cannot fully integrate rich textual semantics with image features, leading to an inability to control image generation through text finely. To address this issue, we propose a GAN-based method named MVESF (Multi-View Enhanced Semantic Fusion), which enhances the semantic fusion of text and images from multiple perspectives for fine-grained controllable text-to-image synthesis. In multi-view, we introduce Multi-domain Semantic Guidance, Local Semantic Attention, and Visual-textual Consistency Loss to enhance the semantic fusion of text and images in image generation, image discrimination, and image supervision, respectively. Our method promotes the consistent alignment between text and images, allowing for fine-grained variations in the generated images when subtle changes in the input text without affecting unrelated regions. Extensive experimental results have demonstrated the effectiveness of our approach.

Original languageEnglish
Title of host publication2024 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2024 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1610-1617
Number of pages8
ISBN (Electronic)9781665410205
DOIs
StatePublished - 2024
Event2024 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2024 - Kuching, Malaysia
Duration: 6 Oct 202410 Oct 2024

Publication series

NameConference Proceedings - IEEE International Conference on Systems, Man and Cybernetics
ISSN (Print)1062-922X

Conference

Conference2024 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2024
Country/TerritoryMalaysia
CityKuching
Period6/10/2410/10/24

Fingerprint

Dive into the research topics of 'MVESF: Multi-View Enhanced Semantic Fusion for Controllable Text-to-Image Generation'. Together they form a unique fingerprint.

Cite this