跳到主要导航 跳到搜索 跳到主要内容

ITVTON: Virtual Try-On Diffusion Transformer Based on Integrated Image and Text

  • Haifeng Ni*
  • , Ming Xu
  • *此作品的通讯作者
  • East China Normal University

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Virtual try-on, which aims to seamlessly fit garments onto person images, has recently seen significant progress with diffusion-based models. However, existing methods commonly resort to duplicated backbones or additional image encoders to extract garment features, which increases computational overhead and network complexity. In this paper, we propose ITVTON, an efficient framework that leverages the Diffusion Transformer (DiT) as its single generator to improve image fidelity. By concatenating garment and person images along the width dimension and incorporating textual descriptions from both, ITVTON effectively captures garment-person interactions while preserving realism. To further reduce computational cost, we restrict training to the attention parameters within a single Diffusion Transformer (Single-DiT) block. Extensive experiments demonstrate that ITVTON surpasses baseline methods both qualitatively and quantitatively, setting a new standard for virtual try-on. Moreover, experiments on 10,257 image pairs from IGPair confirm its robustness in real-world scenarios.

源语言英语
主期刊名Pattern Recognition and Computer Vision - 8th Chinese Conference, PRCV 2025, Proceedings
编辑Josef Kittler, Hongkai Xiong, Jian Yang, Xilin Chen, Jiwen Lu, Weiyao Lin, Jingyi Yu, Weishi Zheng
出版商Springer Science and Business Media Deutschland GmbH
460-474
页数15
ISBN(印刷版)9789819556786
DOI
出版状态已出版 - 2026
活动8th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2025 - Shanghai, 中国
期限: 15 10月 202518 10月 2025

出版系列

姓名Lecture Notes in Computer Science
16277 LNCS
ISSN(印刷版)0302-9743
ISSN(电子版)1611-3349

会议

会议8th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2025
国家/地区中国
Shanghai
时期15/10/2518/10/25

指纹

探究 'ITVTON: Virtual Try-On Diffusion Transformer Based on Integrated Image and Text' 的科研主题。它们共同构成独一无二的指纹。

引用此