TY - GEN
T1 - EMCLR
T2 - 48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023
AU - Liu, Meng
AU - Yi, Ran
AU - Ma, Lizhuang
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - One of the bottlenecks of self-supervised contrastive learning is the degenerate constant solution, where all the samples are mapped to one single point in representation space. To prevent such collapses, the mainstream paradigm is using negative samples, forcing negative pairs to push away. However, such manner results in O (2) time and space complexities, limiting the expansibility, scalability and efficiency. We observe current negative-requiring objectives can be decomposed to alignment and uniformity, where uniformity dominates the O (N2) complexity. To reduce the complexity, inspired by the traditional EM algorithm, we derive the embedding matrix of each batch with optimally uniform distribution and discard the uniformity part in objectives. Specifically, for stacked embedding matrices of two views, we first calculate the optimal solution of one view by the proposed algorithm. Then we align the embedding matrix with the obtained optimal solution. The learning paradigm ingeniously avoids model collapses without ad-hoc negative pairs and reduces the square complexity to linear. Extensive experiments on CIFAR-10/100 and STL-10 show the proposed methods achieve comparable results in O(N) complexity.
AB - One of the bottlenecks of self-supervised contrastive learning is the degenerate constant solution, where all the samples are mapped to one single point in representation space. To prevent such collapses, the mainstream paradigm is using negative samples, forcing negative pairs to push away. However, such manner results in O (2) time and space complexities, limiting the expansibility, scalability and efficiency. We observe current negative-requiring objectives can be decomposed to alignment and uniformity, where uniformity dominates the O (N2) complexity. To reduce the complexity, inspired by the traditional EM algorithm, we derive the embedding matrix of each batch with optimally uniform distribution and discard the uniformity part in objectives. Specifically, for stacked embedding matrices of two views, we first calculate the optimal solution of one view by the proposed algorithm. Then we align the embedding matrix with the obtained optimal solution. The learning paradigm ingeniously avoids model collapses without ad-hoc negative pairs and reduces the square complexity to linear. Extensive experiments on CIFAR-10/100 and STL-10 show the proposed methods achieve comparable results in O(N) complexity.
KW - Contrastive learning
KW - Representation learning
KW - Self-supervised learning
UR - https://www.scopus.com/pages/publications/85177570912
U2 - 10.1109/ICASSP49357.2023.10094790
DO - 10.1109/ICASSP49357.2023.10094790
M3 - 会议稿件
AN - SCOPUS:85177570912
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
BT - ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 4 June 2023 through 10 June 2023
ER -