Premium
Coding for Demographic Categories in the Creation of Legacy Corpora: Asian American Ethnic Identities
Author(s) -
HallLew Lauren,
Wong Amy Wingmei
Publication year - 2014
Publication title -
language and linguistics compass
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.619
H-Index - 44
ISSN - 1749-818X
DOI - 10.1111/lnc3.12117
Subject(s) - ethnic group , linguistics , coding (social sciences) , identity (music) , computer science , macro , sociology , anthropology , social science , philosophy , physics , acoustics , programming language
A set of shared coding conventions for speaker ethnicity is necessary for open‐source data sharing and cross‐study compatibility between linguistic corpora. However, ethnicity, like many other aspects of speaker identity, is continually negotiated and reproduced in discourse, and therefore a challenge to code representatively. This paper discusses some of the challenges facing researchers who want to use, create, or contribute to existing corpora that are annotated for the ethnic identity of a speaker. We specifically problematize the macro‐social label ‘Asian American’ and propose that researchers should consider different levels and types of specificity of ‘Asianness’ in order to ensure that the corpora best represent the reality of ethnic identity in the community sampled. This is particularly important given the limited incorporation of different Asian groups in most existing linguistic research). We argue that more rigorous coding for Asian American ethnicities in corpora will improve the utility of archived corpora and enhance sociolinguistic research on language variation and ethnic identity.