TY - GEN
T1 - MVESF
T2 - 2024 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2024
AU - Zhang, Hai
AU - Cao, Guitao
AU - Wang, Xinke
AU - Quan, Jiahao
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Text-to-image(T2I) synthesis aims to generate semantically consistent images with texts. Currently, existing methods merely use the shallow semantics of the text to crudely guide image generation. They cannot fully integrate rich textual semantics with image features, leading to an inability to control image generation through text finely. To address this issue, we propose a GAN-based method named MVESF (Multi-View Enhanced Semantic Fusion), which enhances the semantic fusion of text and images from multiple perspectives for fine-grained controllable text-to-image synthesis. In multi-view, we introduce Multi-domain Semantic Guidance, Local Semantic Attention, and Visual-textual Consistency Loss to enhance the semantic fusion of text and images in image generation, image discrimination, and image supervision, respectively. Our method promotes the consistent alignment between text and images, allowing for fine-grained variations in the generated images when subtle changes in the input text without affecting unrelated regions. Extensive experimental results have demonstrated the effectiveness of our approach.
AB - Text-to-image(T2I) synthesis aims to generate semantically consistent images with texts. Currently, existing methods merely use the shallow semantics of the text to crudely guide image generation. They cannot fully integrate rich textual semantics with image features, leading to an inability to control image generation through text finely. To address this issue, we propose a GAN-based method named MVESF (Multi-View Enhanced Semantic Fusion), which enhances the semantic fusion of text and images from multiple perspectives for fine-grained controllable text-to-image synthesis. In multi-view, we introduce Multi-domain Semantic Guidance, Local Semantic Attention, and Visual-textual Consistency Loss to enhance the semantic fusion of text and images in image generation, image discrimination, and image supervision, respectively. Our method promotes the consistent alignment between text and images, allowing for fine-grained variations in the generated images when subtle changes in the input text without affecting unrelated regions. Extensive experimental results have demonstrated the effectiveness of our approach.
UR - https://www.scopus.com/pages/publications/85217855461
U2 - 10.1109/SMC54092.2024.10831828
DO - 10.1109/SMC54092.2024.10831828
M3 - 会议稿件
AN - SCOPUS:85217855461
T3 - Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics
SP - 1610
EP - 1617
BT - 2024 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2024 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 6 October 2024 through 10 October 2024
ER -