Developing robust models for favourability analysis

Daoud Clarke, Peter Lane, Paul Hender

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    40 Downloads (Pure)

    Abstract

    Locating documents carrying positive or negative favourability is an important application within media analysis. This paper presents some empirical results on the challenges facing a machine-learning approach to this kind
    of opinion mining. Some of the challenges include: the often considerable imbalance in the distribution of positive and negative samples;
    changes in the documents over time; and effective training and quantification procedures for reporting results. This paper begins with three datasets generated by a media-analysis company, classifying documents in two ways:
    detecting the presence of favourability, and assessing negative vs. positive favourability. We then evaluate a machine-learning approach to
    automate the classification process. We explore the effect of using five different types of features, the robustness of the models when tested on data taken from a later time period, and the effect of balancing the input data by
    undersampling. We find varying choices for the optimum classifier, feature set and training strategy depending on the task and dataset.
    Original languageEnglish
    Title of host publicationSecond Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (WASSA)
    EditorsA Balahur, E Boldrini, A Montoyo, P Martinez-Barco
    PublisherAssociation for Computational Linguistics
    Pages44-52
    Publication statusPublished - 2011

    Fingerprint

    Dive into the research topics of 'Developing robust models for favourability analysis'. Together they form a unique fingerprint.

    Cite this