University of Hertfordshire

Towards effective malware clustering: reducing false negatives through feature weighting and the Lp metric

Research output: Chapter in Book/Report/Conference proceedingChapter (peer-reviewed)peer-review

  • Renato Cordeiro De Amorim
  • Peter Komisarczuk
View graph of relations
Original languageEnglish
Title of host publicationCase Studies in Secure Computing
Subtitle of host publicationAchievements and Trends
EditorsBiju Issac, Nauman Israr
PublisherCRC Press
ISBN (Print)9781482207064
Publication statusPublished - Sep 2014


In this paper we present a novel method to reduce the incidence of false negatives in the clustering of malware detected during drive-by-download attacks. Our method comprises the use of a high-interaction client honey-pot called Capture-HPC to acquire behavioural system and network data, and the application of clustering analysis. Our method addresses various issues in clustering, including (i) finding the number of clusters in the dataset, (ii) finding good initial centroids, (iii) determining the relevance of each of the features at each cluster. Our method applies partitional clustering based on the Minkowski Weighted K-Means (Lp) and anomalous pattern initialization. We have performed various experiments on a dataset containing the behaviour of 17,000 possibly infected websites gathered from sources of malicious URLs. We find that our method produces a smaller within cluster variance and a lower quantity of false negatives than other popular clustering algorithms such as K-Means and the Ward's method.

ID: 9822830