TY - GEN
T1 - Single-cell Curriculum Learning-based Deep Graph Embedding Clustering
AU - Li, Huifa
AU - Fu, Jie
AU - Ling, Xinpeng
AU - Sun, Zhiyu
AU - Wang, Kuncan
AU - Chen, Zhili
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - The swift advancement of single-cell RNA sequencing (scRNA-seq) technologies enables the investigation of cellular-level tissue heterogeneity. Cell annotation significantly contributes to the extensive downstream analysis of scRNA-seq data. However, The analysis of scRNA-seq for biological inference presents challenges owing to its intricate and indeterminate data distribution, characterized by a substantial volume and a high frequency of dropout events. Furthermore, the quality of training samples varies greatly, and the performance of the popular scRNA-seq data clustering solution GNN could be harmed by two types of low-quality training nodes: 1) nodes on the boundary; 2) nodes that contribute little additional information to the graph. To address these problems, we propose a single-cell curriculum learning-based deep graph embedding clustering (scCLG). We first propose a Chebyshev graph convolutional autoencoder with multi-criteria (ChebAE) that combines three optimization objectives, including topology reconstruction loss of cell graphs, zero-inflated negative binomial (ZINB) loss, and clustering loss, to learn cell-cell topology representation. Meanwhile, we employ a selective training strategy to train GNN based on the features and entropy of nodes and prune the difficult nodes based on the difficulty scores to keep the high-quality graph. Empirical results on a variety of gene expression datasets show that our model outperforms state-of-the-art methods. The code of scCLG will be made publicly available at https://github.com/LFD-byte/scCLG.
AB - The swift advancement of single-cell RNA sequencing (scRNA-seq) technologies enables the investigation of cellular-level tissue heterogeneity. Cell annotation significantly contributes to the extensive downstream analysis of scRNA-seq data. However, The analysis of scRNA-seq for biological inference presents challenges owing to its intricate and indeterminate data distribution, characterized by a substantial volume and a high frequency of dropout events. Furthermore, the quality of training samples varies greatly, and the performance of the popular scRNA-seq data clustering solution GNN could be harmed by two types of low-quality training nodes: 1) nodes on the boundary; 2) nodes that contribute little additional information to the graph. To address these problems, we propose a single-cell curriculum learning-based deep graph embedding clustering (scCLG). We first propose a Chebyshev graph convolutional autoencoder with multi-criteria (ChebAE) that combines three optimization objectives, including topology reconstruction loss of cell graphs, zero-inflated negative binomial (ZINB) loss, and clustering loss, to learn cell-cell topology representation. Meanwhile, we employ a selective training strategy to train GNN based on the features and entropy of nodes and prune the difficult nodes based on the difficulty scores to keep the high-quality graph. Empirical results on a variety of gene expression datasets show that our model outperforms state-of-the-art methods. The code of scCLG will be made publicly available at https://github.com/LFD-byte/scCLG.
KW - Curriculum Learning
KW - Graph Clustering
KW - scRNA-seq Data
UR - https://www.scopus.com/pages/publications/85217280807
U2 - 10.1109/BIBM62325.2024.10822530
DO - 10.1109/BIBM62325.2024.10822530
M3 - 会议稿件
AN - SCOPUS:85217280807
T3 - Proceedings - 2024 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2024
SP - 1557
EP - 1562
BT - Proceedings - 2024 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2024
A2 - Cannataro, Mario
A2 - Zheng, Huiru
A2 - Gao, Lin
A2 - Cheng, Jianlin
A2 - de Miranda, Joao Luis
A2 - Zumpano, Ester
A2 - Hu, Xiaohua
A2 - Cho, Young-Rae
A2 - Park, Taesung
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2024 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2024
Y2 - 3 December 2024 through 6 December 2024
ER -