Feature Relevance in Ward’s Hierarchical Clustering Using the Lp Norm

Renato Cordeiro De Amorim

    Research output: Contribution to journalArticlepeer-review

    40 Citations (Scopus)
    135 Downloads (Pure)

    Abstract

    In this paper we introduce a new hierarchical clustering algorithm called Ward p . Unlike the original Ward, Ward p generates feature weights, which can be seen as feature rescaling factors thanks to the use of the L p norm. The feature weights are cluster dependent, allowing a feature to have different degrees of relevance at different clusters.
    We validate our method by performing experiments on a total of 75 real-world and synthetic datasets, with and without added features made of uniformly random noise. Our experiments show that: (i) the use of our feature weighting method produces results that are superior to those produced by the original Ward method on datasets containing noise features; (ii) it is indeed possible to estimate a good exponent p under a totally unsupervised framework. The clusterings produced by Ward p are dependent on p. This makes the estimation of a good value for this exponent a requirement for this algorithm, and indeed for any other also based on the Lp norm.
    Original languageEnglish
    Pages (from-to)46-62
    JournalJournal of Classification
    Volume32
    Issue number1
    Early online date11 Mar 2015
    DOIs
    Publication statusPublished - Apr 2015

    Fingerprint

    Dive into the research topics of 'Feature Relevance in Ward’s Hierarchical Clustering Using the Lp Norm'. Together they form a unique fingerprint.

    Cite this