TY - JOUR
T1 - A-Wardpβ
T2 - Effective hierarchical clustering using the Minkowski metric and a fast k-means initialisation
AU - Cordeiro De Amorim, Renato
AU - Makarenkov, Vladimir
AU - Mirkin, Boris
N1 - This document is the Accepted Manuscript version of the following article: Renato Cordeiro de Amorin, Vladimir Makrenkov, and Boris Mirkin, 'A-Wardpβ: Effective hierarchical clustering using the Minkowski metric and a fast k-means initialisation', Information Services, Vol. 370-371, November 2016, pp. 343-354.
The version of record is available online at doi: https://doi.org/10.1016/j.ins.2016.07.076.
PY - 2016/11/20
Y1 - 2016/11/20
N2 - In this paper we make two novel contributions to hierarchical clustering. First, we introduce an anomalous pattern initialisation method for hierarchical clustering algorithms, called A-Ward, capable of substantially reducing the time they take to converge. This method generates an initial partition with a sufficiently large number of clusters. This allows the cluster merging process to start from this partition rather than from a trivial partition composed solely of singletons. Our second contribution is an extension of the Ward and Wardp algorithms to the situation where the feature weight exponent can differ from the exponent of the Minkowski distance. This new method, called A-Wardpβ, is able to generate a much wider variety of clustering solutions. We also demonstrate that its parameters can be estimated reasonably well by using a cluster validity index. We perform numerous experiments using data sets with two types of noise, insertion of noise features and blurring within-cluster values of some features. These experiments allow us to conclude: (i) our anomalous pattern initialisation method does indeed reduce the time a hierarchical clustering algorithm takes to complete, without negatively impacting its cluster recovery ability; (ii) A-Wardpβ provides better cluster recovery than both Ward and Wardp.
AB - In this paper we make two novel contributions to hierarchical clustering. First, we introduce an anomalous pattern initialisation method for hierarchical clustering algorithms, called A-Ward, capable of substantially reducing the time they take to converge. This method generates an initial partition with a sufficiently large number of clusters. This allows the cluster merging process to start from this partition rather than from a trivial partition composed solely of singletons. Our second contribution is an extension of the Ward and Wardp algorithms to the situation where the feature weight exponent can differ from the exponent of the Minkowski distance. This new method, called A-Wardpβ, is able to generate a much wider variety of clustering solutions. We also demonstrate that its parameters can be estimated reasonably well by using a cluster validity index. We perform numerous experiments using data sets with two types of noise, insertion of noise features and blurring within-cluster values of some features. These experiments allow us to conclude: (i) our anomalous pattern initialisation method does indeed reduce the time a hierarchical clustering algorithm takes to complete, without negatively impacting its cluster recovery ability; (ii) A-Wardpβ provides better cluster recovery than both Ward and Wardp.
KW - Feature weighting
KW - Hierarchical clustering
KW - Initialisation algorithm
KW - Minkowski metric
UR - http://www.scopus.com/inward/record.url?scp=84982851370&partnerID=8YFLogxK
U2 - 10.1016/j.ins.2016.07.076
DO - 10.1016/j.ins.2016.07.076
M3 - Article
AN - SCOPUS:84982851370
SN - 0020-0255
VL - 370-371
SP - 343
EP - 354
JO - Information Sciences
JF - Information Sciences
ER -