TY - GEN
T1 - Disentangling latent space for vae by label relevant/irrelevant dimensions
AU - Zheng, Zhilin
AU - Sun, Li
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/6
Y1 - 2019/6
N2 - VAE requires the standard Gaussian distribution as a prior in the latent space. Since all codes tend to follow the same prior, it often suffers the so-called 'posterior collapse'. To avoid this, this paper introduces the class specific distribution for the latent code. But different from CVAE, we present a method for disentangling the latent space into the label relevant and irrelevant dimensions, zs and zu, for a single input. We apply two separated encoders to map the input into zs and zu respectively, and then give the concatenated code to the decoder to reconstruct the input. The label irrelevant code zu represent the common characteristics of all inputs, hence they are constrained by the standard Gaussian, and their encoder is trained in amortized variational inference way, like VAE. While zs is assumed to follow the Gaussian mixture distribution in which each component corresponds to a particular class. The parameters for the Gaussian components in zs encoder are optimized by the label supervision in a global stochastic way. In theory, we show that our method is actually equivalent to adding a KL divergence term on the joint distribution of zs and the class label c, and it can directly increase the mutual information between zs and the label c. Our model can also be extended to GAN by adding a discriminator in the pixel domain so that it produces high quality and diverse images.
AB - VAE requires the standard Gaussian distribution as a prior in the latent space. Since all codes tend to follow the same prior, it often suffers the so-called 'posterior collapse'. To avoid this, this paper introduces the class specific distribution for the latent code. But different from CVAE, we present a method for disentangling the latent space into the label relevant and irrelevant dimensions, zs and zu, for a single input. We apply two separated encoders to map the input into zs and zu respectively, and then give the concatenated code to the decoder to reconstruct the input. The label irrelevant code zu represent the common characteristics of all inputs, hence they are constrained by the standard Gaussian, and their encoder is trained in amortized variational inference way, like VAE. While zs is assumed to follow the Gaussian mixture distribution in which each component corresponds to a particular class. The parameters for the Gaussian components in zs encoder are optimized by the label supervision in a global stochastic way. In theory, we show that our method is actually equivalent to adding a KL divergence term on the joint distribution of zs and the class label c, and it can directly increase the mutual information between zs and the label c. Our model can also be extended to GAN by adding a discriminator in the pixel domain so that it produces high quality and diverse images.
KW - And Body Pose
KW - Deep Learning
KW - Face
KW - Gesture
KW - Image and Video Synthesis
UR - https://www.scopus.com/pages/publications/85078758252
U2 - 10.1109/CVPR.2019.01247
DO - 10.1109/CVPR.2019.01247
M3 - 会议稿件
AN - SCOPUS:85078758252
T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
SP - 12184
EP - 12193
BT - Proceedings - 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019
PB - IEEE Computer Society
T2 - 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019
Y2 - 16 June 2019 through 20 June 2019
ER -