Entropy Indicators for Investigating Early Language Processes

C. Lyon, C.L. Nehaniv, B. Dickerson

Research output: Chapter in Book/Report/Conference proceedingConference contribution

30 Downloads (Pure)

Abstract

We examine evidence for the hypothesis that language could have passed through a stage when words were combined in structured linear segments and these linear segments could later have become the building blocks for a full hierarchical grammar. Experiments were carried out on the British National--Corpus, consisting of about 100 million words of text from different domains and transcribed speech.--This work extends and supports the results of our previouswork based on a smaller corpus reported previously. Measuring the entropy of the texts we find that entropy declines as words are taken in groups--of 2, 3 and 4, indicating that it is easier to decode words taken in short sequences rather than individually. Entropy further declines when punctuation is represented, showing that appropriate segmentation captures some of the language structure. Further support for the hypothesis that local sequential processing underlies the production and perception of speech comes from neurobiological evidence. The observation that homophones are apparently ubiquitous and used without confusion also suggests that language processing may be largely based on local context.
Original languageEnglish
Title of host publicationIn: AISB'05 : Social Intelligence and Interaction in Animals, Robots and Agents. Proceedings of the Second International Symposium on the Emergence and Evolution of Linguistic Communication (EELC '05)
PublisherThe Society for the Study of Artificial Intelligence and the Simulation of Behaviour (AISB)
Pages64-71
ISBN (Print)1 902956 40 9
Publication statusPublished - 2005

Fingerprint

Dive into the research topics of 'Entropy Indicators for Investigating Early Language Processes'. Together they form a unique fingerprint.

Cite this