Generating similarity cluster of Indonesian languages with semi-supervised clustering | Zendy

Arbi Haza Nasution | Zendy; Yohei Murakami | Zendy; Toru Ishida | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Generating similarity cluster of Indonesian languages with semi-supervised clustering

Author(s) -

Arbi Haza Nasution,

Yohei Murakami,

Toru Ishida

Publication year - 2019

Publication title -

international journal of electrical and computer engineering

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.277

H-Index - 22

ISSN - 2088-8708

DOI - 10.11591/ijece.v9i1.pp531-538

Subject(s) - similarity (geometry) , cluster analysis , cluster (spacecraft) , hierarchical clustering , complete linkage clustering , computer science , linkage (software) , stability (learning theory) , single linkage clustering , artificial intelligence , indonesian , complete linkage , natural language processing , fuzzy clustering , linguistics , machine learning , biology , biochemistry , genotype , single nucleotide polymorphism , image (mathematics) , gene , programming language , philosophy , canopy clustering algorithm

Lexicostatistic and language similarity clusters are useful for computational linguistic researches that depends on language similarity or cognate recognition. Nevertheless, there are no published lexicostatistic/language similarity cluster of Indonesian ethnic languages available. We formulate an approach of creating language similarity clusters by utilizing ASJP database to generate the language similarity matrix, then generate the hierarchical clusters with complete linkage and mean linkage clustering, and further extract two stable clusters with high language similarities. We introduced an extended k-means clustering semi-supervised learning to evaluate the stability level of the hierarchical stable clusters being grouped together despite of changing the number of cluster. The higher the number of the trial, the more likely we can distinctly find the two hierarchical stable clusters in the generated k-clusters. However, for all five experiments, the stability level of the two hierarchical stable clusters is the highest on 5 clusters. Therefore, we take the 5 clusters as the best clusters of Indonesian ethnic languages. Finally, we plot the generated 5 clusters to a geographical map.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore