TY - GEN
T1 - CFRL
T2 - 6th ACM International Conference on Multimedia in Asia, MMAsia 2024
AU - Song, Yiran
AU - Zhou, Qianyu
AU - Hu, Kun
AU - Ma, Lizhuang
AU - Lu, Xuequan
N1 - Publisher Copyright:
© 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.
PY - 2024/12/28
Y1 - 2024/12/28
N2 - Data often faces a severe class imbalance issue in the real world, meaning that the number of instances within classes varies greatly, following a long-tailed distribution. In this case, the direct application of supervised learning yields poor performance. Existing long-tailed recognition (LTR) methods often heavily rely on the label information to enhance tail classes' accuracy at the expense of head class by an image-level end-to-end resampling strategy to address data distribution imbalance. Nevertheless, they neglect label bias, which can severely affect the LTR model's accuracy. In this paper, we propose a novel approach, namely Coarse-Fine Decoupled Representation Learning (CFRL) for LTR. Our core idea is to decouple data representations from the classifier and decompose representation learning into two stages: image-level and patch-level. Specifically, in the image-level stage, we leverage unsupervised learning on image-level information to reduce the impact of label bias caused by imbalanced datasets. In the patch-level stage, we introduce patch-level rotation augmentation as negative samples, forcing the model to acquire more comprehensive information. Our theoretical and empirical analyses demonstrate that the approach does not sacrifice the accuracy of head classes while significantly reducing the overfitting of tail classes, improving both of them. We showcase state-of-the-art results on CIFAR, ImageNet, and iNaturalist datasets. Furthermore, we illustrate that this training methodology can be combined with various existing Long-Tailed Recognition (LTR) methods, further enhancing their performance.
AB - Data often faces a severe class imbalance issue in the real world, meaning that the number of instances within classes varies greatly, following a long-tailed distribution. In this case, the direct application of supervised learning yields poor performance. Existing long-tailed recognition (LTR) methods often heavily rely on the label information to enhance tail classes' accuracy at the expense of head class by an image-level end-to-end resampling strategy to address data distribution imbalance. Nevertheless, they neglect label bias, which can severely affect the LTR model's accuracy. In this paper, we propose a novel approach, namely Coarse-Fine Decoupled Representation Learning (CFRL) for LTR. Our core idea is to decouple data representations from the classifier and decompose representation learning into two stages: image-level and patch-level. Specifically, in the image-level stage, we leverage unsupervised learning on image-level information to reduce the impact of label bias caused by imbalanced datasets. In the patch-level stage, we introduce patch-level rotation augmentation as negative samples, forcing the model to acquire more comprehensive information. Our theoretical and empirical analyses demonstrate that the approach does not sacrifice the accuracy of head classes while significantly reducing the overfitting of tail classes, improving both of them. We showcase state-of-the-art results on CIFAR, ImageNet, and iNaturalist datasets. Furthermore, we illustrate that this training methodology can be combined with various existing Long-Tailed Recognition (LTR) methods, further enhancing their performance.
KW - Long Tail Recognition
KW - Representation learning
UR - https://www.scopus.com/pages/publications/85216181576
U2 - 10.1145/3696409.3700195
DO - 10.1145/3696409.3700195
M3 - 会议稿件
AN - SCOPUS:85216181576
T3 - Proceedings of the 6th ACM International Conference on Multimedia in Asia, MMAsia 2024
BT - Proceedings of the 6th ACM International Conference on Multimedia in Asia, MMAsia 2024
PB - Association for Computing Machinery, Inc
Y2 - 3 December 2024 through 6 December 2024
ER -