z-logo
open-access-imgOpen Access
Automatic classification model of semi-structured HTML text data based on State Grid cloud architecture
Author(s) -
Enjie Zhang,
Zhidong Zhang,
Liang-Jun Yan,
Da Li
Publication year - 2021
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1920/1/012072
Subject(s) - computer science , cloud computing , python (programming language) , artificial neural network , data mining , classifier (uml) , grid , artificial intelligence , architecture , naive bayes classifier , machine learning , database , support vector machine , operating system , art , geometry , mathematics , visual arts
Regarding the construction of the “State Grid Cloud” platform, various businesses of the power grid have their own Wed systems. The data between different websites is scattered and the coupling between resources is low, which can easily form the problem of information islands. This paper is oriented to the semi-structured HTML text data in web pages under the State Grid cloud architecture platform, and uses the Python-based Scrapy framework to collect semi-structured power data information from various power business websites. We propose a semi-structured text data classification model based on BIGRU neural network and Bayesian classifier. BIGRU neural network is used to extract text features, Bayesian classifier is used for classification, and the TF-TDF algorithm is used to assign weights to improve the traditional recurrent neural network model with many parameters and long training model time. We use this method to simulate the semi-structured HTML text data of the State Grid, and conduct a comparative experiment with the traditional neural network model. The experimental results show that the classification algorithm can effectively improve the efficiency and accuracy of power semi-structured text data classification.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here