TY - JOUR
T1 - A methodology for automatic classification of breast cancer immunohistochemical data using semi-supervised Fuzzy c-means
AU - Lai, Daphne Teck Ching
AU - Garibaldi, Jonathan M.
AU - Soria, Daniele
AU - Roadknight, Christopher M.
PY - 2014/9
Y1 - 2014/9
N2 - Previously, a semi-manual method was used to identify six novel and clinically useful classes in the Nottingham Tenovus Breast Cancer dataset. 663 out of 1,076 patients were classified. The objectives of our work is three folds. Firstly, our primary objective is to use one single automatic method (post-initialisation) to reproduce the six classes for the 663 patients and to classify the remaining 413 patients. Secondly, we explore using semi-supervised fuzzy c-means with various distance metrics and initialisation techniques to achieve this. Thirdly, the clinical characteristics of the 413 patients are examined by comparing with the 663 patients. Our experiments use various amount of labelled data and 10-fold cross validation to reproduce and evaluate the classification. ssFCM with Euclidean distance and initialisation technique by Katsavounidis et al. produced the best results. It is then used to classify the 413 patients. Visual evaluation of the 413 patients' classifications revealed common characteristics as those previously reported. Examination of clinical characteristics indicates significant associations between classification and clinical parameters. More importantly, association between classification and survival based on the survival curves is shown.
AB - Previously, a semi-manual method was used to identify six novel and clinically useful classes in the Nottingham Tenovus Breast Cancer dataset. 663 out of 1,076 patients were classified. The objectives of our work is three folds. Firstly, our primary objective is to use one single automatic method (post-initialisation) to reproduce the six classes for the 663 patients and to classify the remaining 413 patients. Secondly, we explore using semi-supervised fuzzy c-means with various distance metrics and initialisation techniques to achieve this. Thirdly, the clinical characteristics of the 413 patients are examined by comparing with the 663 patients. Our experiments use various amount of labelled data and 10-fold cross validation to reproduce and evaluate the classification. ssFCM with Euclidean distance and initialisation technique by Katsavounidis et al. produced the best results. It is then used to classify the 413 patients. Visual evaluation of the 413 patients' classifications revealed common characteristics as those previously reported. Examination of clinical characteristics indicates significant associations between classification and clinical parameters. More importantly, association between classification and survival based on the survival curves is shown.
KW - Breast cancer
KW - Fuzzy clustering
KW - Molecular classification
UR - http://www.scopus.com/inward/record.url?scp=84904413657&partnerID=8YFLogxK
U2 - 10.1007/s10100-013-0318-3
DO - 10.1007/s10100-013-0318-3
M3 - Article
AN - SCOPUS:84904413657
SN - 1435-246X
VL - 22
SP - 475
EP - 499
JO - Central European Journal of Operations Research
JF - Central European Journal of Operations Research
IS - 3
ER -