TY - GEN
T1 - Disentangling the Spatial Structure and Style in Conditional VAE
AU - Zhang, Ziye
AU - Sun, Li
AU - Zheng, Zhilin
AU - Li, Qingli
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/10
Y1 - 2020/10
N2 - This paper proposes a structure in conditional variation autoencoder (cVAE) to disentangle the latent vector into a spatial structure and a style code, complementary to each other, with the one ( zs) being label relevant and the other ( zu) irrelevant. Different from traditional cVAE, our network maps the condition label into its relevant code zs through a separated module. Depending on whether the label directly relates to the image spatial structure or not, zs output from the condition mapping module is used either as the style code with the two spatial dimension of 1 \times 1, or as the spatial structure code with a single channel. Based on the input image and its corresponding zs, the encoder provides the posterior distribution close to a common prior regardless of its label, thus zu sampled from it becomes label irrelevant. The decoder employs zs and zu by two typical adaptive normalization modules to reconstruct the input image. Results on two datasets with different types of labels show the effectiveness of our method.
AB - This paper proposes a structure in conditional variation autoencoder (cVAE) to disentangle the latent vector into a spatial structure and a style code, complementary to each other, with the one ( zs) being label relevant and the other ( zu) irrelevant. Different from traditional cVAE, our network maps the condition label into its relevant code zs through a separated module. Depending on whether the label directly relates to the image spatial structure or not, zs output from the condition mapping module is used either as the style code with the two spatial dimension of 1 \times 1, or as the spatial structure code with a single channel. Based on the input image and its corresponding zs, the encoder provides the posterior distribution close to a common prior regardless of its label, thus zu sampled from it becomes label irrelevant. The decoder employs zs and zu by two typical adaptive normalization modules to reconstruct the input image. Results on two datasets with different types of labels show the effectiveness of our method.
KW - GAN
KW - cVAE
KW - disentanglement
UR - https://www.scopus.com/pages/publications/85098671555
U2 - 10.1109/ICIP40778.2020.9190908
DO - 10.1109/ICIP40778.2020.9190908
M3 - 会议稿件
AN - SCOPUS:85098671555
T3 - Proceedings - International Conference on Image Processing, ICIP
SP - 1626
EP - 1630
BT - 2020 IEEE International Conference on Image Processing, ICIP 2020 - Proceedings
PB - IEEE Computer Society
T2 - 2020 IEEE International Conference on Image Processing, ICIP 2020
Y2 - 25 September 2020 through 28 September 2020
ER -