z-logo
open-access-imgOpen Access
Clustering of Translation via Topic Modeling
Author(s) -
Федор Краснов,
Vladimir V. Lebedev
Publication year - 2019
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1405/1/012008
Subject(s) - computer science , linguistics , natural language processing , cluster analysis , matrix (chemical analysis) , modal , translation (biology) , english language , artificial intelligence , de facto , information retrieval , philosophy , chemistry , materials science , biochemistry , messenger rna , polymer chemistry , composite material , gene , political science , law
In the present study, the authors investigated structural differences between scientific articles arising from their translation from Russian to English. In the course of the research, a modal topic modelling technique was used. Each document in the assembled collection was presented in two versions: English and Russian. As a result of constructing the topic model, the bimodal matrices Φ and Θ were obtained. An analysis of the Φ matrix showed that the topics could be distinguished according to the degree of correspondence between Russian and English terms when considering words in decreasing order of probability. For 90% of the topics, the English words fully corresponded to the Russian words used. Analysis of the Θ matrix showed that for 99% of the documents a topic exists having a value greater than 0.95. Thus, the majority of documents are monotopical. Moreover, this majority does not depend on the language of the document. Although the de facto language of scientific articles nowadays is English, a considerable number of scientific works are initially published in scientists’ respective native languages and only subsequently translated into English in a more complete and in-depth form. Thus, it is possible to speak about the existence of a bilingual corpus of such documents.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here