Selecting Features in Origin Analysis

P. D. Green, P.C.R. Lane, A. Rainer, S. Scholz

    Research output: Chapter in Book/Report/Conference proceedingConference contribution


    When applying a machine-learning approach to develop classifiers in a new domain, an important question is what measurements to take and how they will be used to construct informative features. This paper develops a novel set of machine-learning classifiers for the domain of classifying files taken from software projects; the target classifications are based on origin analysis. Our approach adapts the output of four copy-analysis tools, generating a number of different measurements. By combining the measures and the files on which they operate, a large set of features is generated in a semi-automatic manner. After which, standard attribute selection and classifier training techniques yield a pool of high quality classifiers (accuracy in the range of 90%), and information on the most relevant features.
    Original languageEnglish
    Title of host publicationResearch and Development in Intelligent Systems XXVII, Incorporating Applications and Innovations in Intelligent Systems XVIII,
    Subtitle of host publicationProceedings of AI-2010, The Thirtieth SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence
    PublisherSpringer Nature
    ISBN (Electronic)978-0-85729-130-1
    ISBN (Print)978-0-85729-129-5
    Publication statusPublished - 2010


    • data mining
    • feature construction
    • origin analysis
    • machine learning


    Dive into the research topics of 'Selecting Features in Origin Analysis'. Together they form a unique fingerprint.

    Cite this