Research on Methods of Parsing and Classification of Internet Super Large-scale Texts | Zendy

Miaojing Song | Zendy; Hang Zheng | Zendy; Tao Zhang | Zendy; Jia Jiang | Zendy; Bin Pan | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Research on Methods of Parsing and Classification of Internet Super Large-scale Texts

Author(s) -

Miaojing Song,

Hang Zheng,

Tao Zhang,

Jia Jiang,

Bin Pan

Publication year - 2021

Publication title -

journal of physics. conference series

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.21

H-Index - 85

eISSN - 1742-6596

pISSN - 1742-6588

DOI - 10.1088/1742-6596/1757/1/012121

Subject(s) - computer science , xml , parsing , naive bayes classifier , the internet , information retrieval , download , simple api for xml , world wide web , support vector machine , xml signature , artificial intelligence , efficient xml interchange

Web crawlers are an important part of modern search engines. With the development of the times, data has shown explosive growth, and mankind has entered a “big data era”. For example, Wikipedia, which carries knowledge achievements from all over the world, records real-time news that occurs every day and provides users with a good text search database[1]. Wikipedia updates data up to 50+GB every day. This project focuses on solving the problems of data acquisition and data analysis. At the same time, it downloads and parses the latest data of Wikipedia and analyzes XML files, and then uses SVM algorithm and Naive Bayes algorithm to classify articles, Train the model to download Wikipedia files efficiently and parse XML files.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore