Unsupervised Learning-based Anomalous Arabic Text Detection

Nasser Abouzakhar, Ben Allison , Louise Guthrie

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    3 Citations (Scopus)
    174 Downloads (Pure)


    The growing dependence of modern society on the Web as a vital source of information and communication has become inevitable. However, the Web has become an ideal channel for various terrorist organisations to publish their misleading information and send unintelligible messages to communicate with their clients as well. The increase in the number of published anomalous misleading information on the Web has led to an increase in security threats. The existing Web security mechanisms and protocols are not appropriately designed to deal with such recently developed problems. Developing technology to detect anomalous textual information has become one of the major challenges within the NLP community. This paper introduces the problem of anomalous text detection by automatically extracting linguistic features from documents and evaluating those features for patterns of suspicious and/or inconsistent information in Arabic documents. In order to achieve that, we defined specific linguistic features that characterise various Arabic writing styles. Also, the paper introduces the main challenges in Arabic processing and describes the proposed unsupervised learning model for detecting anomalous Arabic textual information.
    Original languageEnglish
    Title of host publicationProcs 6th Language Resources and Evaluation Conference
    Subtitle of host publicationLREC 2008
    Publication statusPublished - 2008
    Event6th edition of the Language Resources and Evaluation Conference - Marrakech, Morocco
    Duration: 28 May 200830 May 2013


    Conference6th edition of the Language Resources and Evaluation Conference


    • Natural language processing
    • Arabic text processing
    • Anomalous text detection
    • Unsupervised learning


    Dive into the research topics of 'Unsupervised Learning-based Anomalous Arabic Text Detection'. Together they form a unique fingerprint.

    Cite this