Comparing Different Text Similarity Methods

J. Bao, C. Lyon, P.C.R. Lane, W. Ji, J. Malcolm

    Research output: Book/ReportOther report

    453 Downloads (Pure)

    Abstract

    This paper reports experiments on a corpus of news articles from the Financial Times, comparing different text similarity models. First the Ferret system using a method based solely on lexical similarities is used, then methods based on semantic similarities are investigated. Different feature string selection criteria are used, for instance with and without synonyms obtained from WordNet, or with noun phrases extracted for comparison. The results indicate that synonyms rather than lexical strings are important for finding similar texts. Hypernyms and noun phrases also contribute to the identification of text similarity,--though they are not better than synonyms. However, precision is a problem for the semantic similarity methods because too many irrelevant texts are retrieved.
    Original languageEnglish
    PublisherUniversity of Hertfordshire
    Publication statusPublished - 2007

    Publication series

    NameUH Computer Science Technical Report
    PublisherUniversity of Hertfordshire
    Volume461

    Fingerprint

    Dive into the research topics of 'Comparing Different Text Similarity Methods'. Together they form a unique fingerprint.

    Cite this