Recovering the number of clusters in data sets with noise features using feature rescaling factors

Renato Cordeiro De Amorim, Christian Hennig

    Research output: Contribution to journalArticlepeer-review

    176 Citations (Scopus)
    234 Downloads (Pure)

    Abstract

    In this paper we introduce three methods for re-scaling data sets aiming at improving the likelihood of clustering validity indexes to return the true number of spherical Gaussian clusters with additional noise features. Our method obtains feature re-scaling factors taking into account the structure of a given data set and the intuitive idea that different features may have different degrees of relevance at different clusters.
    We experiment with the Silhouette (using squared Euclidean, Manhattan, and the pth power of the Minkowski distance), Dunn’s, Calinski–Harabasz and Hartigan indexes on data sets with spherical Gaussian clusters with and without noise features. We conclude that our methods indeed increase the chances of estimating the true number of clusters in a data set.
    Original languageEnglish
    Pages (from-to)126-145
    JournalInformation Sciences
    Volume324
    Early online date30 Jun 2015
    DOIs
    Publication statusPublished - 10 Dec 2015

    Fingerprint

    Dive into the research topics of 'Recovering the number of clusters in data sets with noise features using feature rescaling factors'. Together they form a unique fingerprint.

    Cite this