TY - GEN
T1 - Bridging the Gap between Label- and Reference-based Synthesis in Multi-attribute Image-to-image Translation
AU - Huang, Qiusheng
AU - Zheng, Zhilin
AU - Hu, Xueqi
AU - Sun, Li
AU - Li, Qingli
N1 - Publisher Copyright:
© 2021 IEEE
PY - 2021
Y1 - 2021
N2 - The image-to-image translation (I2IT) model takes a target label or a reference image as the input, and changes a source into the specified target domain. The two types of synthesis, either label- or reference-based, have substantial differences. Particularly, the label-based synthesis reflects the common characteristics of the target domain, and the reference-based shows the specific style similar to the reference. This paper intends to bridge the gap between them in the task of multi-attribute I2IT. We design the label- and reference-based encoding modules (LEM and REM) to compare the domain differences. They first transfer the source image and target label (or reference) into a common embedding space, by providing the opposite directions through the attribute difference vector. Then the two embeddings are simply fused together to form the latent code Srand (or Sref), reflecting the domain style differences, which is injected into each layer of the generator by SPADE. To link LEM and REM, so that two types of results benefit each other, we encourage the two latent codes to be close, and set up the cycle consistency between the forward and backward translations on them. Moreover, the interpolation between the Srand and Sref is also used to synthesize an extra image. Experiments show that label- and reference-based synthesis are indeed mutually promoted, so that we can have the diverse results from LEM, and high quality results with the similar style of the reference. Code will be available at https://github.com/huangqiusheng/BridgeGAN.
AB - The image-to-image translation (I2IT) model takes a target label or a reference image as the input, and changes a source into the specified target domain. The two types of synthesis, either label- or reference-based, have substantial differences. Particularly, the label-based synthesis reflects the common characteristics of the target domain, and the reference-based shows the specific style similar to the reference. This paper intends to bridge the gap between them in the task of multi-attribute I2IT. We design the label- and reference-based encoding modules (LEM and REM) to compare the domain differences. They first transfer the source image and target label (or reference) into a common embedding space, by providing the opposite directions through the attribute difference vector. Then the two embeddings are simply fused together to form the latent code Srand (or Sref), reflecting the domain style differences, which is injected into each layer of the generator by SPADE. To link LEM and REM, so that two types of results benefit each other, we encourage the two latent codes to be close, and set up the cycle consistency between the forward and backward translations on them. Moreover, the interpolation between the Srand and Sref is also used to synthesize an extra image. Experiments show that label- and reference-based synthesis are indeed mutually promoted, so that we can have the diverse results from LEM, and high quality results with the similar style of the reference. Code will be available at https://github.com/huangqiusheng/BridgeGAN.
UR - https://www.scopus.com/pages/publications/85127267581
U2 - 10.1109/ICCV48922.2021.01436
DO - 10.1109/ICCV48922.2021.01436
M3 - 会议稿件
AN - SCOPUS:85127267581
T3 - Proceedings of the IEEE International Conference on Computer Vision
SP - 14608
EP - 14617
BT - Proceedings - 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 18th IEEE/CVF International Conference on Computer Vision, ICCV 2021
Y2 - 11 October 2021 through 17 October 2021
ER -