TOPIC SEGMENTATION METHODS COMPARISON ON COMPUTER SCIENCE TEXTS | Zendy

Volodymyr Sokol | Zendy; Vitalii Krykun | Zendy; Mariia Bilova | Zendy; Ivan Perepelytsya | Zendy; Volodymyr Pustovarov | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

TOPIC SEGMENTATION METHODS COMPARISON ON COMPUTER SCIENCE TEXTS

Author(s) -

Volodymyr Sokol,

Vitalii Krykun,

Mariia Bilova,

Ivan Perepelytsya,

Volodymyr Pustovarov

Publication year - 2021

Publication title -

vestnik nacionalʹnogo tehničeskogo universiteta "hpi". sistemnyj analiz, upravlenie i informacionnye tehnologii/vestnik nacionalʹnogo tehničeskogo universiteta "hpi". seriâ sistemnyj analiz, upravlenie i informacionnye tehnologii

Language(s) - English

Resource type - Journals

eISSN - 2410-2857

pISSN - 2079-0023

DOI - 10.20998/2079-0023.2021.02.10

Subject(s) - computer science , segmentation , context (archaeology) , set (abstract data type) , implementation , software , information retrieval , data science , information extraction , artificial intelligence , software engineering , paleontology , biology , programming language

The demand for the creation of information systems that simplifies and accelerates work has greatly increased in the context of the rapidinformatization of society and all its branches. It provokes the emergence of more and more companies involved in the development of softwareproducts and information systems in general. In order to ensure the systematization, processing and use of this knowledge, knowledge managementsystems are used. One of the main tasks of IT companies is continuous training of personnel. This requires export of the content from the company'sknowledge management system to the learning management system. The main goal of the research is to choose an algorithm that allows solving theproblem of marking up the text of articles close to those used in knowledge management systems of IT companies. To achieve this goal, it is necessaryto compare various topic segmentation methods on a dataset with a computer science texts. Inspec is one such dataset used for keyword extraction andin this research it has been adapted to the structure of the datasets used for the topic segmentation problem. The TextTiling and TextSeg methods wereused for comparison on some well-known data science metrics and specific metrics that relate to the topic segmentation problem. A new generalizedmetric was also introduced to compare the results for the topic segmentation problem. All software implementations of the algorithms were written inPython programming language and represent a set of interrelated functions. Results were obtained showing the advantages of the Text Seg method incomparison with TextTiling when compared using classical data science metrics and special metrics developed for the topic segmentation task. Fromall the metrics, including the introduced one it can be concluded that the TextSeg algorithm performs better than the TextTiling algorithm on theadapted Inspec test data set.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore