TY - JOUR
T1 - A novel staging system derived from natural language processing of pathology reports to predict prognostic outcomes of pancreatic cancer
T2 - a retrospective cohort study
AU - Li, Bo
AU - Wang, Beilei
AU - Zhuang, Pengjie
AU - Cao, Hongwei
AU - Wu, Shengyong
AU - Tan, Zhendong
AU - Gao, Suizhi
AU - Li, Penghao
AU - Jing, Wei
AU - Shao, Zhuo
AU - Zheng, Kailian
AU - Wu, Lele
AU - Gao, Bai
AU - Wang, Yang
AU - Jiang, Hui
AU - Guo, Shiwei
AU - He, Liang
AU - Yang, Yan
AU - Jin, Gang
N1 - Publisher Copyright:
Copyright © 2023 The Author(s). Published by Wolters Kluwer Health, Inc.
PY - 2023/11/1
Y1 - 2023/11/1
N2 - Objective: To construct a novel tumor-node-morphology (TNMor) staging system derived from natural language processing (NLP) of pathology reports to predict outcomes of pancreatic ductal adenocarcinoma. Method: This retrospective study with 1657 participants was based on a large referral center and The Cancer Genome Atlas Program (TCGA) dataset. In the training cohort, NLP was used to extract and screen prognostic predictors from pathology reports to develop the TNMor system, which was further evaluated with the tumor-node-metastasis (TNM) system in the internal and external validation cohort, respectively. Main outcomes were evaluated by the log-rank test of Kaplan–Meier curves, the concordance index (C-index), and the area under the receiver operating curve (AUC). Results: The precision, recall, and F1 scores of the NLP model were 88.83, 89.89, and 89.21%, respectively. In Kaplan–Meier analysis, survival differences between stages in the TNMor system were more significant than that in the TNM system. In addition, our system provided an improved C-index (internal validation, 0.58 vs. 0.54, P < 0.001; external validation, 0.64 vs. 0.63, P < 0.001), and higher AUCs for 1, 2, and 3-year survival (internal validation: 0.62 vs. 0.54, P < 0.001; 0.64 vs. 0.60, P = 0.017; 0.69 vs. 0.62, P = 0.001; external validation: 0.69 vs. 0.65, P = 0.098; 0.68 vs. 0.64, P = 0.154; 0.64 vs. 0.55, P = 0.032, respectively). Finally, our system was particularly beneficial for precise stratification of patients receiving adjuvant therapy, with an improved C-index (0.61 vs. 0.57, P < 0.001), and higher AUCs for 1-year, 2-year, and 3-year survival (0.64 vs. 0.57, P < 0.001; 0.64 vs. 0.58, P < 0.001; 0.67 vs. 0.61, P < 0.001; respectively) compared with the TNM system. Conclusion: These findings suggest that the TNMor system performed better than the TNM system in predicting pancreatic ductal adenocarcinoma prognosis. It is a promising system to screen risk-adjusted strategies for precision medicine.
AB - Objective: To construct a novel tumor-node-morphology (TNMor) staging system derived from natural language processing (NLP) of pathology reports to predict outcomes of pancreatic ductal adenocarcinoma. Method: This retrospective study with 1657 participants was based on a large referral center and The Cancer Genome Atlas Program (TCGA) dataset. In the training cohort, NLP was used to extract and screen prognostic predictors from pathology reports to develop the TNMor system, which was further evaluated with the tumor-node-metastasis (TNM) system in the internal and external validation cohort, respectively. Main outcomes were evaluated by the log-rank test of Kaplan–Meier curves, the concordance index (C-index), and the area under the receiver operating curve (AUC). Results: The precision, recall, and F1 scores of the NLP model were 88.83, 89.89, and 89.21%, respectively. In Kaplan–Meier analysis, survival differences between stages in the TNMor system were more significant than that in the TNM system. In addition, our system provided an improved C-index (internal validation, 0.58 vs. 0.54, P < 0.001; external validation, 0.64 vs. 0.63, P < 0.001), and higher AUCs for 1, 2, and 3-year survival (internal validation: 0.62 vs. 0.54, P < 0.001; 0.64 vs. 0.60, P = 0.017; 0.69 vs. 0.62, P = 0.001; external validation: 0.69 vs. 0.65, P = 0.098; 0.68 vs. 0.64, P = 0.154; 0.64 vs. 0.55, P = 0.032, respectively). Finally, our system was particularly beneficial for precise stratification of patients receiving adjuvant therapy, with an improved C-index (0.61 vs. 0.57, P < 0.001), and higher AUCs for 1-year, 2-year, and 3-year survival (0.64 vs. 0.57, P < 0.001; 0.64 vs. 0.58, P < 0.001; 0.67 vs. 0.61, P < 0.001; respectively) compared with the TNM system. Conclusion: These findings suggest that the TNMor system performed better than the TNM system in predicting pancreatic ductal adenocarcinoma prognosis. It is a promising system to screen risk-adjusted strategies for precision medicine.
KW - natural language processing
KW - pancreatic ductal adenocarcinoma
KW - pathological report
KW - prognosis
KW - stratification
UR - https://www.scopus.com/pages/publications/85178542715
U2 - 10.1097/JS9.0000000000000648
DO - 10.1097/JS9.0000000000000648
M3 - 文章
C2 - 37578452
AN - SCOPUS:85178542715
SN - 1743-9191
VL - 109
SP - 3476
EP - 3489
JO - International Journal of Surgery
JF - International Journal of Surgery
IS - 11
ER -