TY - JOUR
T1 - DEEPMAP
T2 - DEEP LEARNING-BASED SINGLE-CELL DATA INTEGRATION USING ITERATIVE CELL MATCHING AND STRUCTURE PRESERVATION CONSTRAINTS
AU - Xu, Shuntuo
AU - Yu, Zhou
AU - Ming, Jingsi
N1 - Publisher Copyright:
© 2024, Institute of Mathematical Statistics. All rights reserved.
PY - 2024/12
Y1 - 2024/12
N2 - Effective integration of single-cell data can facilitate the discovery of cell-type specific gene expression patterns and cellular interactions, ulti-mately leading to a better understanding of various biological processes and diseases. However, datasets from different platforms, species, and modali-ties exhibit various levels of heterogeneities, posing significant challenges in data alignment using a unified approach. Here we propose DeepMap, a flexible and efficient method for single-cell data integration, by taking advantage of the deep learning framework. Our method utilizes iterative cell matching based on mutual nearest neighbors, leverages an autoencoder framework to learn harmonized representations of cells from various datasets, and incorpo-rates a covariance penalty term into the framework for structure preservation. In addition to harmonization of data from different datasets, we specifically take account of the preservation of important biological variations within dataset, which is crucial to reliable downstream analysis. Comprehensive real data analysis demonstrates the flexibility of DeepMap for diverse datasets from different platforms, species, and modalities, and highlights its marked ability in preserving structures over existing integration methods with en-hanced computational efficiency and optimized memory usage. The robust DeepMap-integrated data offers promising prospects for advancing our understanding of cell biology, hence making it a highly attractive option for integrative single-cell data analysis.
AB - Effective integration of single-cell data can facilitate the discovery of cell-type specific gene expression patterns and cellular interactions, ulti-mately leading to a better understanding of various biological processes and diseases. However, datasets from different platforms, species, and modali-ties exhibit various levels of heterogeneities, posing significant challenges in data alignment using a unified approach. Here we propose DeepMap, a flexible and efficient method for single-cell data integration, by taking advantage of the deep learning framework. Our method utilizes iterative cell matching based on mutual nearest neighbors, leverages an autoencoder framework to learn harmonized representations of cells from various datasets, and incorpo-rates a covariance penalty term into the framework for structure preservation. In addition to harmonization of data from different datasets, we specifically take account of the preservation of important biological variations within dataset, which is crucial to reliable downstream analysis. Comprehensive real data analysis demonstrates the flexibility of DeepMap for diverse datasets from different platforms, species, and modalities, and highlights its marked ability in preserving structures over existing integration methods with en-hanced computational efficiency and optimized memory usage. The robust DeepMap-integrated data offers promising prospects for advancing our understanding of cell biology, hence making it a highly attractive option for integrative single-cell data analysis.
KW - Single-cell data integration
KW - deep learning
KW - iterative cell matching
KW - structure preservation constraints
UR - https://www.scopus.com/pages/publications/85211173688
U2 - 10.1214/24-AOAS1954
DO - 10.1214/24-AOAS1954
M3 - 文章
AN - SCOPUS:85211173688
SN - 1932-6157
VL - 18
SP - 3596
EP - 3613
JO - Annals of Applied Statistics
JF - Annals of Applied Statistics
IS - 4
ER -