University of Hertfordshire

By the same authors

Selecting Features in Origin Analysis

Research output: Chapter in Book/Report/Conference proceedingConference contribution

View graph of relations
Original languageEnglish
Title of host publicationResearch and Development in Intelligent Systems XXVII, Incorporating Applications and Innovations in Intelligent Systems XVIII,
Subtitle of host publicationProceedings of AI-2010, The Thirtieth SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence
ISBN (Electronic)978-0-85729-130-1
ISBN (Print)978-0-85729-129-5
Publication statusPublished - 2010


When applying a machine-learning approach to develop classifiers in a new domain, an important question is what measurements to take and how they will be used to construct informative features. This paper develops a novel set of machine-learning classifiers for the domain of classifying files taken from software projects; the target classifications are based on origin analysis. Our approach adapts the output of four copy-analysis tools, generating a number of different measurements. By combining the measures and the files on which they operate, a large set of features is generated in a semi-automatic manner. After which, standard attribute selection and classifier training techniques yield a pool of high quality classifiers (accuracy in the range of 90%), and information on the most relevant features.


Original paper can be found at: Copyright Springer

Research outputs

ID: 98385