A Two-Stage Pretraining-Finetuning Framework for Treatment Effect Estimation with Unmeasured Confounding

  • Chuan Zhou
  • , Yaxuan Li
  • , Chunyuan Zheng
  • , Haiteng Zhang
  • , Min Zhang
  • , Haoxuan Li*
  • , Mingming Gong*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

Estimating the conditional average treatment effect (CATE) from observational data plays a crucial role in areas such as e-commerce, healthcare, and economics. Existing studies mainly rely on the strong ignorability assumption that there are no unmeasured confounders, whose presence cannot be tested from observational data and can invalidate any causal conclusion. In contrast, data collected from randomized controlled trials (RCT) do not suffer from confounding, but are usually limited by a small sample size. In this paper, we propose a two-stage pretraining-finetuning (TSPF) framework using both large-scale observational data and small-scale RCT data to estimate the CATE in the presence of unmeasured confounding. In the first stage, a foundational representation of covariates is trained to estimate counterfactual outcomes through large-scale observational data. In the second stage, we propose to train an augmented representation of the covariates, which is concatenated to the foundational representation obtained in the first stage to adjust for the unmeasured confounding. To avoid overfitting caused by the small-scale RCT data in the second stage, we further propose a partial parameter initialization approach, rather than training a separate network. The superiority of our approach is validated on two public datasets with extensive experiments. The code is available at https://github.com/zhouchuanCN/KDD25-TSPF.

Original languageEnglish
Title of host publicationKDD 2025 - Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages2113-2123
Number of pages11
ISBN (Electronic)9798400712456
DOIs
StatePublished - 20 Jul 2025
Event31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2025 - Toronto, Canada
Duration: 3 Aug 20257 Aug 2025

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Volume1
ISSN (Print)2154-817X

Conference

Conference31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2025
Country/TerritoryCanada
CityToronto
Period3/08/257/08/25

Keywords

  • causal effect estimation
  • data fusion
  • unmeasured confounding

Fingerprint

Dive into the research topics of 'A Two-Stage Pretraining-Finetuning Framework for Treatment Effect Estimation with Unmeasured Confounding'. Together they form a unique fingerprint.

Cite this