Abstract
Many academic staff will recognise that unusual shared elements in student submissions trigger suspicion of inappropriate collusion.
These elements may be odd phrases, strange constructs, peculiar layout, or spelling mistakes. In this paper we review twenty-nine approaches to source-code plagiarism detection, showing that the majority focus on overall file similarity, and not on unusual shared elements, and that none directly measure these elements. We describe an approach to detecting similarity between files which focuses on these unusual similarities. The approach is token-based and therefore largely language independent, and is tested on a set of student assignments, each one consisting of a mix of programming languages. We also introduce a technique for visualising one document in relation to another in the context of the group. This visualisation separates code which is unique to the document, that shared by just the two files, code shared by small groups, and uninteresting areas of the file.
These elements may be odd phrases, strange constructs, peculiar layout, or spelling mistakes. In this paper we review twenty-nine approaches to source-code plagiarism detection, showing that the majority focus on overall file similarity, and not on unusual shared elements, and that none directly measure these elements. We describe an approach to detecting similarity between files which focuses on these unusual similarities. The approach is token-based and therefore largely language independent, and is tested on a set of student assignments, each one consisting of a mix of programming languages. We also introduce a technique for visualising one document in relation to another in the context of the group. This visualisation separates code which is unique to the document, that shared by just the two files, code shared by small groups, and uninteresting areas of the file.
Original language | English |
---|---|
Number of pages | 25 |
Publication status | Published - 2012 |
Event | 5th International Plagiarism Conference - Sage Gateshead, Newcastle, United Kingdom Duration: 17 Jul 2012 → 18 Jul 2012 https://www.plagiarismtoday.com/2012/07/18/the-5th-international-plagiarism-conference-day-one/ |
Conference
Conference | 5th International Plagiarism Conference |
---|---|
Country/Territory | United Kingdom |
City | Newcastle |
Period | 17/07/12 → 18/07/12 |
Internet address |