TY - JOUR
T1 - Establishing community reference samples, data and call sets for benchmarking cancer mutation detection using whole-genome sequencing
AU - The Somatic Mutation Working Group of Sequencing Quality Control Phase II Consortium
AU - Fang, Li Tai
AU - Zhu, Bin
AU - Zhao, Yongmei
AU - Chen, Wanqiu
AU - Yang, Zhaowei
AU - Kerrigan, Liz
AU - Langenbach, Kurt
AU - de Mars, Maryellen
AU - Lu, Charles
AU - Idler, Kenneth
AU - Jacob, Howard
AU - Zheng, Yuanting
AU - Ren, Luyao
AU - Yu, Ying
AU - Jaeger, Erich
AU - Schroth, Gary P.
AU - Abaan, Ogan D.
AU - Talsania, Keyur
AU - Lack, Justin
AU - Shen, Tsai Wei
AU - Chen, Zhong
AU - Stanbouly, Seta
AU - Tran, Bao
AU - Shetty, Jyoti
AU - Kriga, Yuliya
AU - Meerzaman, Daoud
AU - Nguyen, Cu
AU - Petitjean, Virginie
AU - Sultan, Marc
AU - Cam, Margaret
AU - Mehta, Monika
AU - Hung, Tiffany
AU - Peters, Eric
AU - Kalamegham, Rasika
AU - Sahraeian, Sayed Mohammad Ebrahim
AU - Mohiyuddin, Marghoob
AU - Guo, Yunfei
AU - Yao, Lijing
AU - Song, Lei
AU - Lam, Hugo Y.K.
AU - Drabek, Jiri
AU - Vojta, Petr
AU - Maestro, Roberta
AU - Gasparotto, Daniela
AU - Köks, Sulev
AU - Reimann, Ene
AU - Scherer, Andreas
AU - Nordlund, Jessica
AU - Liljedahl, Ulrika
AU - Shi, Tieliu
N1 - Publisher Copyright:
© 2021, This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply.
PY - 2021/9
Y1 - 2021/9
N2 - The lack of samples for generating standardized DNA datasets for setting up a sequencing pipeline or benchmarking the performance of different algorithms limits the implementation and uptake of cancer genomics. Here, we describe reference call sets obtained from paired tumor–normal genomic DNA (gDNA) samples derived from a breast cancer cell line—which is highly heterogeneous, with an aneuploid genome, and enriched in somatic alterations—and a matched lymphoblastoid cell line. We partially validated both somatic mutations and germline variants in these call sets via whole-exome sequencing (WES) with different sequencing platforms and targeted sequencing with >2,000-fold coverage, spanning 82% of genomic regions with high confidence. Although the gDNA reference samples are not representative of primary cancer cells from a clinical sample, when setting up a sequencing pipeline, they not only minimize potential biases from technologies, assays and informatics but also provide a unique resource for benchmarking ‘tumor-only’ or ‘matched tumor–normal’ analyses.
AB - The lack of samples for generating standardized DNA datasets for setting up a sequencing pipeline or benchmarking the performance of different algorithms limits the implementation and uptake of cancer genomics. Here, we describe reference call sets obtained from paired tumor–normal genomic DNA (gDNA) samples derived from a breast cancer cell line—which is highly heterogeneous, with an aneuploid genome, and enriched in somatic alterations—and a matched lymphoblastoid cell line. We partially validated both somatic mutations and germline variants in these call sets via whole-exome sequencing (WES) with different sequencing platforms and targeted sequencing with >2,000-fold coverage, spanning 82% of genomic regions with high confidence. Although the gDNA reference samples are not representative of primary cancer cells from a clinical sample, when setting up a sequencing pipeline, they not only minimize potential biases from technologies, assays and informatics but also provide a unique resource for benchmarking ‘tumor-only’ or ‘matched tumor–normal’ analyses.
UR - https://www.scopus.com/pages/publications/85115970978
U2 - 10.1038/s41587-021-00993-6
DO - 10.1038/s41587-021-00993-6
M3 - 文章
C2 - 34504347
AN - SCOPUS:85115970978
SN - 1087-0156
VL - 39
SP - 1151
EP - 1160
JO - Nature Biotechnology
JF - Nature Biotechnology
IS - 9
ER -