Evaluation Methods for Statistically Dependent Text | Zendy

Sarvnaz Karimi | Zendy; Jie Yin | Zendy; Jiri Baum | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Evaluation Methods for Statistically Dependent Text

Author(s) -

Sarvnaz Karimi,

Jie Yin,

Jiri Baum

Publication year - 2015

Publication title -

computational linguistics

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.314

H-Index - 98

eISSN - 1530-9312

pISSN - 0891-2017

DOI - 10.1162/coli_a_00230

Subject(s) - computer science , social media , microblogging , data science , task (project management) , cross validation , artificial intelligence , world wide web , management , economics

In recent years, many studies have been published on data collected from social media, especially microblogs such as Twitter. However, rather few of these studies have considered evaluation methodologies that take into account the statistically dependent nature of such data, which breaks the theoretical conditions for using cross-validation. Despite concerns raised in the past about using cross-validation for data of similar characteristics, such as time series, some of these studies evaluate their work using standard k-fold cross-validation. Through experiments on Twitter data collected during a two-year period that includes disastrous events, we show that by ignoring the statistical dependence of the text messages published in social media, standard cross-validation can result in misleading conclusions in a machine learning task. We explore alternative evaluation methods that explicitly deal with statistical dependence in text. Our work also raises concerns for any other data for which similar conditions might hold.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research