TY - JOUR
T1 - Data classification using the Dempster-Shafer method
AU - Chen, Qi
AU - Whitbrook, Amanda
AU - Aickelin, Uwe
AU - Roadknight, Chris
N1 - Publisher Copyright:
© 2014 Taylor & Francis.
PY - 2014/10/2
Y1 - 2014/10/2
N2 - In this paper, the Dempster-Shafer (D-S) method is used as the theoretical basis for creating data classification systems. Testing is carried out using three popular multiple attribute benchmark data-sets that have two, three and four classes. In each case, a subset of the available data is used for training to establish thresholds, limits or likelihoods of class membership for each attribute, and hence create mass functions that establish probability of class membership for each attribute of the test data. Classification of each data item is achieved by combination of these probabilities via Dempster's rule of combination. Results for the first two data-sets show extremely high classification accuracy that is competitive with other popular methods. The third data-set is non-numerical and difficult to classify, but good results can be achieved provided the system and mass functions are designed carefully and the right attributes are chosen for combination. In all cases, the D-S method provides comparable performance to other more popular algorithms, but the overhead of generating accurate mass functions increases the complexity with the addition of new attributes. Overall, the results suggest that the D-S approach provides a suitable framework for the design of classification systems and that automating the mass function design and calculation would increase the viability of the algorithm for complex classification problems.
AB - In this paper, the Dempster-Shafer (D-S) method is used as the theoretical basis for creating data classification systems. Testing is carried out using three popular multiple attribute benchmark data-sets that have two, three and four classes. In each case, a subset of the available data is used for training to establish thresholds, limits or likelihoods of class membership for each attribute, and hence create mass functions that establish probability of class membership for each attribute of the test data. Classification of each data item is achieved by combination of these probabilities via Dempster's rule of combination. Results for the first two data-sets show extremely high classification accuracy that is competitive with other popular methods. The third data-set is non-numerical and difficult to classify, but good results can be achieved provided the system and mass functions are designed carefully and the right attributes are chosen for combination. In all cases, the D-S method provides comparable performance to other more popular algorithms, but the overhead of generating accurate mass functions increases the complexity with the addition of new attributes. Overall, the results suggest that the D-S approach provides a suitable framework for the design of classification systems and that automating the mass function design and calculation would increase the viability of the algorithm for complex classification problems.
KW - data classification
KW - Dempster's rule of combination
KW - Dempster-Shafer theory
UR - http://www.scopus.com/inward/record.url?scp=84909943781&partnerID=8YFLogxK
U2 - 10.1080/0952813X.2014.886301
DO - 10.1080/0952813X.2014.886301
M3 - Article
AN - SCOPUS:84909943781
SN - 0952-813X
VL - 26
SP - 493
EP - 517
JO - Journal of Experimental and Theoretical Artificial Intelligence
JF - Journal of Experimental and Theoretical Artificial Intelligence
IS - 4
ER -