z-logo
Premium
Identifying long tail term from large‐scale candidate pairs for big data‐oriented patent analysis
Author(s) -
Qu Peng,
Zhang Junsheng,
Yao Changqing,
Zeng Wen
Publication year - 2016
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.3792
Subject(s) - term (time) , computer science , string (physics) , information retrieval , data mining , set (abstract data type) , scale (ratio) , rank (graph theory) , perspective (graphical) , tf–idf , inverse , artificial intelligence , mathematics , geography , combinatorics , physics , geometry , cartography , quantum mechanics , mathematical physics , programming language
Summary Patent is a very important and valuable type of scientific and technical big data. This paper presents how to mine patent text to obtain valuable information/knowledge from large‐scale candidates obtained from these patents based on massive patent texts. We firstly propose a patent term extraction method using co‐occurrence in the abstract and first‐claim sections of patent records. There are three steps: (1) we extract candidate strings according to our definition of a term; (2) we propose an assumption to verify whether a candidate string is a qualified term or not by using the co‐occurrence of terms in the abstract and first claim; and (3) we use term frequency–inverse document frequencyAUTHOR: TF‐IDF has been defined as “term frequency–inverse document frequency”. Please check if correct. or mutual information to rank and select candidate terms. Secondly, we propose a new method to obtain valuable long tail term from patents. To fulfill the purpose, (1) we firstly build long tail term–common term pair as candidate set; (2) then we evaluate each candidate pair's value; and finally, (3) to demonstrate our method, we give an example on our result. This study provides a new perspective in extracting terms from free texts of patent records and also proposes a new method to obtain valuable long term to aid information analysis with massive patent texts. Copyright © 2016 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom