Optimal subsampling proportional subdistribution hazards regression with rare events in big data

Erqian Li, Man Lai Tang, Maozai Tian, Keming Yu

Research output: Contribution to journalArticlepeer-review

Abstract

The proportional subdistribution hazards (PSH) model has been widely employed for analyzing competing risks data which have mutually exclusive events with multiple causes and commonly occur in clinical research. With the rapid development of healthcare industry, massively sized survival data sets are becoming increasingly prevalent and classical PSH models are computationally intensive with large data sets. In this article, we propose the optimal subsampling estimators and two-step algorithm for the Fine-Gray model. Asymptotic properties of the proposed estimators are established and an extensive simulation study is conducted to demonstrate the efficiency of the estimators. Our proposed methodology is then illustrated with the large dataset from the SEER (Surveillance, Epidemiology, and End Results) database.
Original languageEnglish
Pages (from-to)361-377
Number of pages17
JournalStatistics and Its Interface
Volume18
Issue number3
Early online date17 Jan 2025
DOIs
Publication statusE-pub ahead of print - 17 Jan 2025

Keywords

  • Big data
  • Competing risks data
  • Optimal subsampling

Fingerprint

Dive into the research topics of 'Optimal subsampling proportional subdistribution hazards regression with rare events in big data'. Together they form a unique fingerprint.

Cite this