z-logo
open-access-imgOpen Access
Efficient Sensitive Information Classification and Topic Tracking Based on Tibetan Web Pages
Author(s) -
Guixian Xu,
Ziheng Yu,
Qi Qi
Publication year - 2018
Publication title -
ieee access
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.587
H-Index - 127
ISSN - 2169-3536
DOI - 10.1109/access.2018.2870122
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Internet is an important platform to spread public opinion for Tibetan people. The research on the Tibetan Web pages content analysis is meaningful for public opinion monitoring. Detecting sensitive words is beneficial to understand public opinion of the minority. In this paper, we present a novel sensitive information classification algorithm and topic tracking algorithm for Web pages contents. First, a text sensitive information classification method is proposed based on a vector space model and cosine theorem. The main idea is the different locations of sensitive words gives different importance degrees at term weight computing. Building sensitive word list is an artificial work. Compared with sensitive thesaurus, Web texts are classified. Sensitive word list is the foundation of classification. After the classification of each texts, a new topic tracking algorithm is introduced, which monitors sensitive words during a period of time. The first step is to compute weight of sensitive words in a fixed period of time and select the top 10 sensitive words. The second step is to select the top 3 sensitive words to track in 10 sensitive words. Experiments show that the classification of the text sensitive information is very effective and result of topic tracking is ideal.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom