Prospecting Information Extraction by Text Mining Based on Convolutional Neural Networks–A Case Study of the Lala Copper Deposit, China | Zendy

Li Shi | Zendy; Chen Jianping | Zendy; Xiang Jie | Zendy

Open Access

Prospecting Information Extraction by Text Mining Based on Convolutional Neural Networks–A Case Study of the Lala Copper Deposit, China

Author(s) -

Li Shi,

Chen Jianping,

Xiang Jie

Publication year - 2018

Publication title -

ieee access

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.587

H-Index - 127

ISSN - 2169-3536

DOI - 10.1109/access.2018.2870203

Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation

With geological big data becoming a focus of geoscience research, the vast amount of textual geoscience data provides both opportunities and challenges for data analysis and data mining. In fact, it does not seem possible to meet the demands of the big data age through the traditional manual reading for information extraction and gaining knowledge. In this paper, a workflow is proposed to extract prospecting information by text mining based on convolutional neural networks (CNNs). The aim is to classify the text data and extract the prospecting information automatically. The procedure involves three parts: 1) text data acquisition; 2) text classification based on CNN; and 3) statistics and visualization. First, the large amount of available text data was acquired based on geoscience big data acquisition methodologies. After text preprocessing, the CNN was used to classify the geoscience text data into four categories (geology, geophysics, geochemistry, and remote sensing), with each category consisting of three levels of text scales (word, sentence, and paragraph). Second, the word frequency statistics, co-occurrence matrix statistics, and term frequency-inverse document frequency (TF-IDF) statistics were for words, sentences, and paragraphs, respectively, which aimed to obtain the key nodes and links derived from the content-words. Finally, the deep semantic information of the big data mining of relevant geoscience texts was visualized by word clouds, knowledge graphs (e.g., the chord and bigram graphs), and TF-IDF statistical graphs. The Lala copper deposit in Sichuan province was taken as a test case, for which the prospecting information was extracted successfully by the developed text mining methodologies. This paper provides a strong basis for research into establishing mineral deposits prospecting models based on logical knowledge trees. In addition, it shows the great potential of this method for intelligent information extraction within geoscience big data.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research