University of Hertfordshire

From the same journal

Recovering the number of clusters in data sets with noise features using feature rescaling factors

Research output: Contribution to journalArticlepeer-review

Standard

Recovering the number of clusters in data sets with noise features using feature rescaling factors. / Cordeiro De Amorim, Renato; Hennig, Christian.

In: Information Sciences, Vol. 324, 10.12.2015, p. 126-145.

Research output: Contribution to journalArticlepeer-review

Harvard

APA

Vancouver

Author

Cordeiro De Amorim, Renato ; Hennig, Christian. / Recovering the number of clusters in data sets with noise features using feature rescaling factors. In: Information Sciences. 2015 ; Vol. 324. pp. 126-145.

Bibtex

@article{2de74f4f98a643b689a746a5a0aea61e,
title = "Recovering the number of clusters in data sets with noise features using feature rescaling factors",
abstract = "In this paper we introduce three methods for re-scaling data sets aiming at improving the likelihood of clustering validity indexes to return the true number of spherical Gaussian clusters with additional noise features. Our method obtains feature re-scaling factors taking into account the structure of a given data set and the intuitive idea that different features may have different degrees of relevance at different clusters.We experiment with the Silhouette (using squared Euclidean, Manhattan, and the pth power of the Minkowski distance), Dunn{\textquoteright}s, Calinski–Harabasz and Hartigan indexes on data sets with spherical Gaussian clusters with and without noise features. We conclude that our methods indeed increase the chances of estimating the true number of clusters in a data set.",
author = "{Cordeiro De Amorim}, Renato and Christian Hennig",
year = "2015",
month = dec,
day = "10",
doi = "10.1016/j.ins.2015.06.039",
language = "English",
volume = "324",
pages = "126--145",
journal = "Information Sciences",
issn = "0020-0255",
publisher = "Elsevier Inc.",

}

RIS

TY - JOUR

T1 - Recovering the number of clusters in data sets with noise features using feature rescaling factors

AU - Cordeiro De Amorim, Renato

AU - Hennig, Christian

PY - 2015/12/10

Y1 - 2015/12/10

N2 - In this paper we introduce three methods for re-scaling data sets aiming at improving the likelihood of clustering validity indexes to return the true number of spherical Gaussian clusters with additional noise features. Our method obtains feature re-scaling factors taking into account the structure of a given data set and the intuitive idea that different features may have different degrees of relevance at different clusters.We experiment with the Silhouette (using squared Euclidean, Manhattan, and the pth power of the Minkowski distance), Dunn’s, Calinski–Harabasz and Hartigan indexes on data sets with spherical Gaussian clusters with and without noise features. We conclude that our methods indeed increase the chances of estimating the true number of clusters in a data set.

AB - In this paper we introduce three methods for re-scaling data sets aiming at improving the likelihood of clustering validity indexes to return the true number of spherical Gaussian clusters with additional noise features. Our method obtains feature re-scaling factors taking into account the structure of a given data set and the intuitive idea that different features may have different degrees of relevance at different clusters.We experiment with the Silhouette (using squared Euclidean, Manhattan, and the pth power of the Minkowski distance), Dunn’s, Calinski–Harabasz and Hartigan indexes on data sets with spherical Gaussian clusters with and without noise features. We conclude that our methods indeed increase the chances of estimating the true number of clusters in a data set.

U2 - 10.1016/j.ins.2015.06.039

DO - 10.1016/j.ins.2015.06.039

M3 - Article

VL - 324

SP - 126

EP - 145

JO - Information Sciences

JF - Information Sciences

SN - 0020-0255

ER -