Abstract
Estimating the sparse covariance matrix can effectively identify important features and patterns, and traditional estimation methods require complete data vectors on all subjects. When data are left-censored due to detection limits, common strategies such as excluding censored individuals or replacing censored values with suitable constants may result in large biases. In this paper, we propose two penalized log-likelihood estimators, incorporating the (Formula presented.) penalty and SCAD penalty, for estimating the sparse covariance matrix of a multivariate normal distribution in the presence of left-censored data. However, the fitting of these penalized estimators poses challenges due to the observed log-likelihood involving high-dimensional integration over the censored variables. To address this issue, we treat censored data as a special case of incomplete data and employ the Expectation Maximization algorithm combined with the coordinate descent algorithm to efficiently fit the two penalized estimators. Through simulation studies, we demonstrate that both penalized estimators achieve greater estimation accuracy compared to methods that replace censored values with constants. Moreover, the SCAD penalized estimator generally outperforms the (Formula presented.) penalized estimator. Our method is used to analyze the proteomic datasets.
Original language | English |
---|---|
Article number | 423 |
Pages (from-to) | 1-17 |
Number of pages | 17 |
Journal | Mathematics |
Volume | 13 |
Issue number | 3 |
Early online date | 27 Jan 2025 |
DOIs | |
Publication status | Published - 28 Feb 2025 |
Keywords
- 62-08
- penalized estimator
- left-censored data
- 62H12
- Expectation Maximization algorithm
- sparse covariance matrix