TY - GEN
T1 - Using n-grams to rapidly characterise the evolution of software code
AU - Rainer, A.
AU - Lane, P.C.R.
AU - Malcolm, J.
AU - Scholz, S.
N1 - “This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder." “Copyright IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.” DOI: 10.1109/ASEW.2008.4686320
PY - 2008
Y1 - 2008
N2 - Text-based approaches to the analysis of software evolution are attractive because of the fine-grained, token-level comparisons they can generate. The use of such approaches has, however, been constrained by the lack of an efficient implementation. In this paper we demonstrate the ability of Ferret, which uses ngrams of 3 tokens, to characterise the evolution of software code. Ferret’s implementation operates in almost linear time and is at least an order of magnitude faster than the diff tool. Ferret’s output can be analysed to reveal several characteristics of software evolution, such as: the lifecycle of a single file, the degree of change between two files, and possible regression. In addition, the similarity scores produced by Ferret can be aggregated to measure larger parts of the system being analysed.
AB - Text-based approaches to the analysis of software evolution are attractive because of the fine-grained, token-level comparisons they can generate. The use of such approaches has, however, been constrained by the lack of an efficient implementation. In this paper we demonstrate the ability of Ferret, which uses ngrams of 3 tokens, to characterise the evolution of software code. Ferret’s implementation operates in almost linear time and is at least an order of magnitude faster than the diff tool. Ferret’s output can be analysed to reveal several characteristics of software evolution, such as: the lifecycle of a single file, the degree of change between two files, and possible regression. In addition, the similarity scores produced by Ferret can be aggregated to measure larger parts of the system being analysed.
UR - http://www.scopus.com/inward/record.url?scp=58049141129&partnerID=8YFLogxK
U2 - 10.1109/ASEW.2008.4686320
DO - 10.1109/ASEW.2008.4686320
M3 - Conference contribution
SN - 978-1-4244-2776-5
SP - 43
EP - 52
BT - Procs 23rd IEEE/ACM Int Conf on Automated Software Engineering
PB - Institute of Electrical and Electronics Engineers (IEEE)
ER -