TY - JOUR
T1 - Few-Shot Diffusion Models Escape the Curse of Dimensionality
AU - Yang, Ruofeng
AU - Jiang, Bo
AU - Chen, Cheng
AU - Jin, Ruinan
AU - Wang, Baoxiang
AU - Li, Shuai
N1 - Publisher Copyright:
© 2024 Neural information processing systems foundation. All rights reserved.
PY - 2024
Y1 - 2024
N2 - While diffusion models have demonstrated impressive performance, there is a growing need for generating samples tailored to specific user-defined concepts. The customized requirements promote the development of few-shot diffusion models, which use limited nta target samples to fine-tune a pre-trained diffusion model trained on ns source samples. Despite the empirical success, no theoretical work specifically analyzes few-shot diffusion models. Moreover, the existing results for diffusion models without a fine-tuning phase can not explain why few-shot models generate great samples due to the curse of dimensionality. In this work, we analyze few-shot diffusion models under a linear structure distribution with a latent dimension d. From the approximation perspective, we prove that few-shot models have a Oe(n−s2/d + nta−1/2) bound to approximate the target score function, which is better than n−ta2/d results. From the optimization perspective, we consider a latent Gaussian special case and prove that the optimization problem has a closed-form minimizer. This means few-shot models can directly obtain an approximated minimizer without a complex optimization process. Furthermore, we also provide the accuracy bound Oe(1/nta + 1/√ns) for the empirical solution, which still has better dependence on nta compared to ns. The results of the real-world experiments also show that the models obtained by only fine-tuning the encoder and decoder specific to the target distribution can produce novel images with the target feature, which supports our theoretical results.
AB - While diffusion models have demonstrated impressive performance, there is a growing need for generating samples tailored to specific user-defined concepts. The customized requirements promote the development of few-shot diffusion models, which use limited nta target samples to fine-tune a pre-trained diffusion model trained on ns source samples. Despite the empirical success, no theoretical work specifically analyzes few-shot diffusion models. Moreover, the existing results for diffusion models without a fine-tuning phase can not explain why few-shot models generate great samples due to the curse of dimensionality. In this work, we analyze few-shot diffusion models under a linear structure distribution with a latent dimension d. From the approximation perspective, we prove that few-shot models have a Oe(n−s2/d + nta−1/2) bound to approximate the target score function, which is better than n−ta2/d results. From the optimization perspective, we consider a latent Gaussian special case and prove that the optimization problem has a closed-form minimizer. This means few-shot models can directly obtain an approximated minimizer without a complex optimization process. Furthermore, we also provide the accuracy bound Oe(1/nta + 1/√ns) for the empirical solution, which still has better dependence on nta compared to ns. The results of the real-world experiments also show that the models obtained by only fine-tuning the encoder and decoder specific to the target distribution can produce novel images with the target feature, which supports our theoretical results.
UR - https://www.scopus.com/pages/publications/105000490865
M3 - 会议文章
AN - SCOPUS:105000490865
SN - 1049-5258
VL - 37
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
T2 - 38th Conference on Neural Information Processing Systems, NeurIPS 2024
Y2 - 9 December 2024 through 15 December 2024
ER -