Tackling the PAN’09 External Plagiarism Detection Corpus with a Desktop Plaigiarism Detector

J. Malcolm, P.C.R. Lane

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    45 Downloads (Pure)

    Abstract

    Ferret is a fast and effective tool for detecting similarities in a group of files. Applying it to the PAN’09 corpus required modifications to meet the requirements of the competition, mainly to deal with the very large number of files, the large size of some of them, and to automate some of the decisions that would normally be made by a human operator. Ferret was able to detect numerous files in the development corpus that contain substantial similarities not marked as plagiarism, but it also identified quite a lot of pairs where random similarities masked actual plagiarism. An improved metric is therefore indicated if the “plagiarised” or “not plagiarised” decision is to be automated.
    Original languageEnglish
    Title of host publicationProcs of the SEPLN'09 Workshop on Uncovering Plagiarsim, Authorship and Social Software Misuse
    EditorsBenno Stein, Paolo Rosso
    Pages29-33
    Publication statusPublished - 2009

    Keywords

    • plagiarism
    • ferret
    • text analysis
    • trigrams

    Fingerprint

    Dive into the research topics of 'Tackling the PAN’09 External Plagiarism Detection Corpus with a Desktop Plaigiarism Detector'. Together they form a unique fingerprint.

    Cite this