Prospecting Information Extraction by Text Mining Based on Convolutional Neural Networks–A Case Study of the Lala Copper Deposit, China
Author(s) -
Li Shi,
Chen Jianping,
Xiang Jie
Publication year - 2018
Publication title -
ieee access
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.587
H-Index - 127
ISSN - 2169-3536
DOI - 10.1109/access.2018.2870203
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
With geological big data becoming a focus of geoscience research, the vast amount of textual geoscience data provides both opportunities and challenges for data analysis and data mining. In fact, it does not seem possible to meet the demands of the big data age through the traditional manual reading for information extraction and gaining knowledge. In this paper, a workflow is proposed to extract prospecting information by text mining based on convolutional neural networks (CNNs). The aim is to classify the text data and extract the prospecting information automatically. The procedure involves three parts: 1) text data acquisition; 2) text classification based on CNN; and 3) statistics and visualization. First, the large amount of available text data was acquired based on geoscience big data acquisition methodologies. After text preprocessing, the CNN was used to classify the geoscience text data into four categories (geology, geophysics, geochemistry, and remote sensing), with each category consisting of three levels of text scales (word, sentence, and paragraph). Second, the word frequency statistics, co-occurrence matrix statistics, and term frequency-inverse document frequency (TF-IDF) statistics were for words, sentences, and paragraphs, respectively, which aimed to obtain the key nodes and links derived from the content-words. Finally, the deep semantic information of the big data mining of relevant geoscience texts was visualized by word clouds, knowledge graphs (e.g., the chord and bigram graphs), and TF-IDF statistical graphs. The Lala copper deposit in Sichuan province was taken as a test case, for which the prospecting information was extracted successfully by the developed text mining methodologies. This paper provides a strong basis for research into establishing mineral deposits prospecting models based on logical knowledge trees. In addition, it shows the great potential of this method for intelligent information extraction within geoscience big data.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom