Premium
Embracing semantic ambiguity to enhance interpretability of complex unstructured machine learning problems.
Author(s) -
Lee James,
Wang Zuemao,
Johnson Arlene
Publication year - 2018
Publication title -
proceedings of the association for information science and technology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.193
H-Index - 14
ISSN - 2373-9231
DOI - 10.1002/pra2.2018.14505501144
Subject(s) - interpretability , ambiguity , computer science , artificial intelligence , meaning (existential) , natural language processing , interpretation (philosophy) , process (computing) , natural language , space (punctuation) , feature (linguistics) , scholarship , linguistics , psychology , philosophy , political science , law , programming language , operating system , psychotherapist
Ambiguity is frequently seen as an impediment to the recovery of a unique interpretation and meaning in texts. Machine learning algorithms have recently been deployed as an effective way to automate the process of semantic disambiguation. However, a growing body of literature has raised questions about problems of reduced accuracy and interpretability that come with disambiguation defined in terms of optimization at large data scales. We propose a hybrid methodology that can take advantage of the increased expediency of optimization (i.e. how a machine learning algorithm efficiently identifies semantic patterns within a corpus), while rethinking ambiguity as a semantically meaningful and interpretively useful feature of linguistic corpora. Our method examines why ambiguity occurs within a natural language corpus and the consequences of ambiguous meanings. Combining the “how” and the “why” dimensions of ambiguity requires a team science approach of researchers trained to study language from different disciplines, and we demonstrate how transdisciplinary digital scholarship centers located in academic libraries can create a space to foster precisely such collaborations purposefully created to analyze ambiguous datasets in more nuanced ways.