Abstract
Outliers are common in longitudinal data analysis, and the multivariate contaminated normal (MCN) distribution in model-based clustering is often used to detect outliers and provide robust parameter estimates in each subgroup. In this paper, we propose a method, the mixture of MCN (MCNM), based on the joint mean-covariance model, specifically designed to analyze longitudinal data characterized by mild outliers. Our model can automatically detect outliers in longitudinal data and provide robust parameter estimates in each subgroup. We use iteratively expectation-conditional maximization (ECM) algorithm and Aitken acceleration to estimate the model parameters, achieving both algorithm acceleration and stable convergence. Our proposed method simultaneously clusters the population, identifies progression patterns of the mean and covariance structures for different subgroups over time, and detects outliers. To demonstrate the effectiveness of our method, we conduct simulation studies under various cases involving different proportions and degrees of contamination. Additionally, we apply our method to real data on the number of people infected with AIDS in 49 countries or regions from 2001 to 2021. Results show that our proposed method effectively clusters the data based on various mean progression trajectories. In summary, our proposed MCNM based on the joint mean-covariance model and MCD of covariance matrices provides a robust method for clustering longitudinal data with mild outliers. It effectively detects outliers and identifies progression patterns in different groups over time, making it valuable for various applications in longitudinal data analysis.
Original language | English |
---|---|
Article number | e11653 |
Pages (from-to) | 1-13 |
Number of pages | 13 |
Journal | Statistical Analysis and Data Mining |
Volume | 17 |
Issue number | 1 |
Early online date | 22 Dec 2023 |
DOIs | |
Publication status | Published - 26 Feb 2024 |
Keywords
- ECM algorithm
- joint mean-covariance modeling
- mixture contaminated normal model
- modified cholesky decomposition
- outlier detection