Abstract
This paper presents a text annotation method based on semantic sequences to label a document and a cluster of documents. The basic idea underlying the semantic sequence approach is to find locally frequent meanings to act as the labels of a document, using an ontology such as WordNet. The ontology is also used to measure the semantic similarity of labels that indicate similarity between documents. Further, a text clustering method based upon four natural rules is introduced to cluster documents and label each cluster. This method does not need any pre-defined number of clusters, which is necessary for the partitioning clustering method, and avoids the need to set appropriate levels as in the hierarachical clustering method.
Original language | English |
---|---|
Journal | Proceedings of the Seventh International Workshop on Computational Semantics |
Publication status | Published - 2007 |
Keywords
- semantic sequences
- text annotation
- WordNet
- clustering