Age identification of twitter users: Classification methods and sociolinguistic analysis

Vasiliki Simaki, Iosif Mporas, Vasileios Megalooikonomou

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    1 Citation (Scopus)

    Abstract

    In this article, we address the problem of age identification of Twitter users, after their online text. We used a set of text mining, sociolinguistic-based and content-related text features, and we evaluated a number of well-known and widely used machine learning algorithms for classification, in order to examine their appropriateness on this task. The experimental results showed that Random Forest algorithm offered superior performance achieving accuracy equal to 61%. We ranked the classification features after their informativity, using the ReliefF algorithm, and we analyzed the results in terms of the sociolinguistic principles on age linguistic variation.
    Original languageEnglish
    Title of host publicationComputational Linguistics and Intelligent Text Processing - 17th International Conference, CICLing 2016, Revised Selected Papers
    EditorsAlexander Gelbukh
    PublisherSpringer Nature
    Pages385-395
    Number of pages11
    ISBN (Print)9783319754864
    DOIs
    Publication statusPublished - 1 Jan 2018
    Event17th International Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2016 - Konya, Turkey
    Duration: 3 Apr 20169 Apr 2016

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume9624 LNCS
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Conference

    Conference17th International Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2016
    Country/TerritoryTurkey
    CityKonya
    Period3/04/169/04/16

    Keywords

    • Age identification
    • Computational Sociolinguistics
    • Sociolinguistics
    • Text classification
    • Text mining

    Fingerprint

    Dive into the research topics of 'Age identification of twitter users: Classification methods and sociolinguistic analysis'. Together they form a unique fingerprint.

    Cite this