Levenshtein Distances Fail to Identify Language Relationships Accurately | Zendy

Simon J. Greenhill | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Levenshtein Distances Fail to Identify Language Relationships Accurately

Author(s) -

Simon J. Greenhill

Publication year - 2011

Publication title -

computational linguistics

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.314

H-Index - 98

eISSN - 1530-9312

pISSN - 0891-2017

DOI - 10.1162/coli_a_00073

Subject(s) - levenshtein distance , edit distance , computer science , similarity (geometry) , metric (unit) , natural language processing , artificial intelligence , string (physics) , string metric , mathematics , string searching algorithm , image (mathematics) , mathematical physics , operations management , pattern matching , economics

The Levenshtein distance is a simple distance metric derived from the number of edit operations needed to transform one string into another. This metric has received recent attention as a means of automatically classifying languages into genealogical subgroups. In this article I test the performance of the Levenshtein distance for classifying languages by subsampling three language subsets from a large database of Austronesian languages. Comparing the classification proposed by the Levenshtein distance to that of the comparative method shows that the Levenshtein classification is correct only 40% of time. Standardizing the orthography increases the performance, but only to a maximum of 65% accuracy within language subgroups. The accuracy of the Levenshtein classification decreases rapidly with phylogenetic distance, failing to discriminate homology and chance similarity across distantly related languages.This poor performance suggests the need for more linguistically nuanced methods for automated language classification tasks.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research