University of Hertfordshire

By the same authors

Analysing Ferret XML reports to estimate the density of copied code

Research output: Book/ReportOther report

Standard

Analysing Ferret XML reports to estimate the density of copied code. / Green, Pamela; Lane, Peter; Rainer, Austen; Scholz, Sven-Bodo.

University of Hertfordshire, 2010. (UH Computer Science Technical Report; Vol. 501).

Research output: Book/ReportOther report

Harvard

Green, P, Lane, P, Rainer, A & Scholz, S-B 2010, Analysing Ferret XML reports to estimate the density of copied code. UH Computer Science Technical Report, vol. 501, University of Hertfordshire.

APA

Green, P., Lane, P., Rainer, A., & Scholz, S-B. (2010). Analysing Ferret XML reports to estimate the density of copied code. (UH Computer Science Technical Report; Vol. 501). University of Hertfordshire.

Vancouver

Green P, Lane P, Rainer A, Scholz S-B. Analysing Ferret XML reports to estimate the density of copied code. University of Hertfordshire, 2010. (UH Computer Science Technical Report).

Author

Green, Pamela ; Lane, Peter ; Rainer, Austen ; Scholz, Sven-Bodo. / Analysing Ferret XML reports to estimate the density of copied code. University of Hertfordshire, 2010. (UH Computer Science Technical Report).

Bibtex

@book{dbce83a897ac416d9b3a3fd1f762fe9d,
title = "Analysing Ferret XML reports to estimate the density of copied code",
abstract = "This document explains a method for identifying dense blocks of copied text in pairs of files. The files are compared suing Ferret, a copy-detection tool which computes a similarity score based on trigrams. This similarity score cannot determine the arrangement of copied text in a file; two files with the same similarity to another file may have different distributions of matched trigrams in the file. For example, in one file the matched trigrams may be in a large block, while they are scattered throughout the other file. However, Ferret produces an XML report which relates matched and unmatched trigrams back to the original text. This report can be analysed to find identical or densely copied blocks in the files. We address the problems of defining and locating the blocks, and of representing the blocks found as a meaningful feature vector, regardless of copy pattern. We provide a step-by-step example to explain our method for finding dense blocks. A set of artificial files, built to mimic different copy patterns, is used to explore a set of features which profile the dense blocks in a file. A range of density parameters is used to construct features which show that the copy patterns in the artificial files can be separated.",
keywords = "density analysis, code duplication, Ferret",
author = "Pamela Green and Peter Lane and Austen Rainer and Sven-Bodo Scholz",
year = "2010",
language = "English",
series = "UH Computer Science Technical Report",
publisher = "University of Hertfordshire",

}

RIS

TY - BOOK

T1 - Analysing Ferret XML reports to estimate the density of copied code

AU - Green, Pamela

AU - Lane, Peter

AU - Rainer, Austen

AU - Scholz, Sven-Bodo

PY - 2010

Y1 - 2010

N2 - This document explains a method for identifying dense blocks of copied text in pairs of files. The files are compared suing Ferret, a copy-detection tool which computes a similarity score based on trigrams. This similarity score cannot determine the arrangement of copied text in a file; two files with the same similarity to another file may have different distributions of matched trigrams in the file. For example, in one file the matched trigrams may be in a large block, while they are scattered throughout the other file. However, Ferret produces an XML report which relates matched and unmatched trigrams back to the original text. This report can be analysed to find identical or densely copied blocks in the files. We address the problems of defining and locating the blocks, and of representing the blocks found as a meaningful feature vector, regardless of copy pattern. We provide a step-by-step example to explain our method for finding dense blocks. A set of artificial files, built to mimic different copy patterns, is used to explore a set of features which profile the dense blocks in a file. A range of density parameters is used to construct features which show that the copy patterns in the artificial files can be separated.

AB - This document explains a method for identifying dense blocks of copied text in pairs of files. The files are compared suing Ferret, a copy-detection tool which computes a similarity score based on trigrams. This similarity score cannot determine the arrangement of copied text in a file; two files with the same similarity to another file may have different distributions of matched trigrams in the file. For example, in one file the matched trigrams may be in a large block, while they are scattered throughout the other file. However, Ferret produces an XML report which relates matched and unmatched trigrams back to the original text. This report can be analysed to find identical or densely copied blocks in the files. We address the problems of defining and locating the blocks, and of representing the blocks found as a meaningful feature vector, regardless of copy pattern. We provide a step-by-step example to explain our method for finding dense blocks. A set of artificial files, built to mimic different copy patterns, is used to explore a set of features which profile the dense blocks in a file. A range of density parameters is used to construct features which show that the copy patterns in the artificial files can be separated.

KW - density analysis

KW - code duplication

KW - Ferret

M3 - Other report

T3 - UH Computer Science Technical Report

BT - Analysing Ferret XML reports to estimate the density of copied code

PB - University of Hertfordshire

ER -