University of Hertfordshire

From the same journal

By the same authors

High-dimensional cluster analysis with the masked EM algorithm

Research output: Contribution to journalArticle

Documents

View graph of relations
Original languageEnglish
Number of pages16
Pages (from-to)2379-2394
JournalNeural Computation
Journal publication date20 Nov 2014
Volume26
Issue11
Early online date10 Oct 2014
DOIs
Publication statusPublished - 20 Nov 2014

Abstract

Cluster analysis faces two problems in high dimensions: the "curse of dimensionality" that can lead to overfitting and poor generalization performance and the sheer time taken for conventional algorithms to process large amounts of high-dimensional data. We describe a solution to these problems, designed for the application of spike sorting for nextgeneration, high-channel-count neural probes. In this problem, only a small subset of features provides information about the cluster membership of any one data vector, but this informative feature subset is not the same for all data points, rendering classical feature selection ineffective.We introduce a "masked EM" algorithm that allows accurate and time-efficient clustering of up to millions of points in thousands of dimensions. We demonstrate its applicability to synthetic data and to real-world high-channel-count spike sorting data.

Notes

This is an Open Access article published under a Creative Commons Attribution 3.0 Unported (CC BY 3.0) license https://creativecommons.org/licenses/by/3.0/

ID: 13613086