z-logo
open-access-imgOpen Access
Preparing Non-English Texts for Computational Analysis
Author(s) -
Quinn Dombrowski
Publication year - 2020
Publication title -
modern languages open
Language(s) - English
Resource type - Journals
ISSN - 2052-5397
DOI - 10.3828/mlo.v0i0.294
Subject(s) - political science , linguistics , computer science , regional science , sociology , philosophy
Most methods for computational text analysis involve doing things with “words”: counting them, looking at their distribution within a text, or seeing how they are juxtaposed with other words. While there’s nothing about these methods that limits their use to English, they tend to be developed with certain assumptions about how “words” work – among them, that words are separated by a space, and that words are minimally inflected (i.e. that there aren’t a lot of different forms of a word). English fits both of these assumptions, but many languages do not. This tutorial covers major challenges for doing computational text analysis caused by the grammar or writing systems of various languages, and ways to overcome these issues.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom