Automated content analysis across six languages | Zendy

Leah C. Windsor | Zendy; J. Cupit | Zendy; Alistair Windsor | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Automated content analysis across six languages

Author(s) -

Leah C. Windsor,

J. Cupit,

Alistair Windsor

Publication year - 2019

Publication title -

plos one

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.99

H-Index - 332

ISSN - 1932-6203

DOI - 10.1371/journal.pone.0224425

Subject(s) - natural language processing , machine translation , computer science , selection (genetic algorithm) , corpus linguistics , artificial intelligence , computational linguistics , field (mathematics) , linguistics , text corpus , selection bias , data science , statistics , philosophy , mathematics , pure mathematics

Corpus selection bias in international relations research presents an epistemological problem: How do we know what we know? Most social science research in the field of text analytics relies on English language corpora, biasing our ability to understand international phenomena. To address the issue of corpus selection bias, we introduce results that suggest that machine translation may be used to address non-English sources. We use human translation and machine translation (Google Translate) on a collection of aligned sentences from United Nations documents extracted from the Multi-UN corpus, analyzed with a “bag of words” analysis tool, Linguistic Inquiry Word Count (LIWC). Overall, the LIWC indices proved relatively stable across machine and human translated sentences. We find that while there are statistically significant differences between the original and translated documents, the effect sizes are relatively small, especially when looking at psychological processes.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research