University of Hertfordshire

Standard

Using a neural net to determine the language in which a text is written. / Lyon, C.; Matthews, C.

University of Hertfordshire, 1995. (UH Computer Science Technical Report; Vol. 212).

Research output: Book/ReportOther report

Harvard

Lyon, C & Matthews, C 1995, Using a neural net to determine the language in which a text is written. UH Computer Science Technical Report, vol. 212, University of Hertfordshire.

APA

Lyon, C., & Matthews, C. (1995). Using a neural net to determine the language in which a text is written. (UH Computer Science Technical Report; Vol. 212). University of Hertfordshire.

Vancouver

Lyon C, Matthews C. Using a neural net to determine the language in which a text is written. University of Hertfordshire, 1995. (UH Computer Science Technical Report).

Author

Lyon, C. ; Matthews, C. / Using a neural net to determine the language in which a text is written. University of Hertfordshire, 1995. (UH Computer Science Technical Report).

Bibtex

@book{65f1abd17e714153bb57986591919800,
title = "Using a neural net to determine the language in which a text is written",
abstract = "There are statistical patterns of letter sequences in natural language, and different languages have different characteristic patterns. This effect can be used to determine in which language a text is written. The patterns are captured with a single layer, feed forward neural net trained in supervised mode. The sequential dependencies of letters are modelled by taking adjacent letter pairs and letter triples. Training and test data are converted to sets of these tuples, which are the basic elements classified by the network. This approach is supported by information theoretic results on the entropy of letter sequences for English. The architecture of the network used is shown to be appropriate for data with the characteristics of natural language letter sequences. For 3 languages over 99% of test strings are correct. For 4 languages, including Dutch and German which are similar, over 92% are correct.",
author = "C. Lyon and C. Matthews",
year = "1995",
language = "English",
series = "UH Computer Science Technical Report",
publisher = "University of Hertfordshire",

}

RIS

TY - BOOK

T1 - Using a neural net to determine the language in which a text is written

AU - Lyon, C.

AU - Matthews, C.

PY - 1995

Y1 - 1995

N2 - There are statistical patterns of letter sequences in natural language, and different languages have different characteristic patterns. This effect can be used to determine in which language a text is written. The patterns are captured with a single layer, feed forward neural net trained in supervised mode. The sequential dependencies of letters are modelled by taking adjacent letter pairs and letter triples. Training and test data are converted to sets of these tuples, which are the basic elements classified by the network. This approach is supported by information theoretic results on the entropy of letter sequences for English. The architecture of the network used is shown to be appropriate for data with the characteristics of natural language letter sequences. For 3 languages over 99% of test strings are correct. For 4 languages, including Dutch and German which are similar, over 92% are correct.

AB - There are statistical patterns of letter sequences in natural language, and different languages have different characteristic patterns. This effect can be used to determine in which language a text is written. The patterns are captured with a single layer, feed forward neural net trained in supervised mode. The sequential dependencies of letters are modelled by taking adjacent letter pairs and letter triples. Training and test data are converted to sets of these tuples, which are the basic elements classified by the network. This approach is supported by information theoretic results on the entropy of letter sequences for English. The architecture of the network used is shown to be appropriate for data with the characteristics of natural language letter sequences. For 3 languages over 99% of test strings are correct. For 4 languages, including Dutch and German which are similar, over 92% are correct.

M3 - Other report

T3 - UH Computer Science Technical Report

BT - Using a neural net to determine the language in which a text is written

PB - University of Hertfordshire

ER -