Identifying long tail term from large‐scale candidate pairs for big data‐oriented patent analysis | Zendy

Qu Peng | Zendy; Zhang Junsheng | Zendy; Yao Changqing | Zendy; Zeng Wen | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Identifying long tail term from large‐scale candidate pairs for big data‐oriented patent analysis

Author(s) -

Qu Peng,

Zhang Junsheng,

Yao Changqing,

Zeng Wen

Publication year - 2016

Publication title -

concurrency and computation: practice and experience

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.309

H-Index - 67

eISSN - 1532-0634

pISSN - 1532-0626

DOI - 10.1002/cpe.3792

Subject(s) - term (time) , computer science , string (physics) , information retrieval , data mining , set (abstract data type) , scale (ratio) , rank (graph theory) , perspective (graphical) , tf–idf , inverse , artificial intelligence , mathematics , geography , combinatorics , physics , geometry , cartography , quantum mechanics , mathematical physics , programming language

Summary Patent is a very important and valuable type of scientific and technical big data. This paper presents how to mine patent text to obtain valuable information/knowledge from large‐scale candidates obtained from these patents based on massive patent texts. We firstly propose a patent term extraction method using co‐occurrence in the abstract and first‐claim sections of patent records. There are three steps: (1) we extract candidate strings according to our definition of a term; (2) we propose an assumption to verify whether a candidate string is a qualified term or not by using the co‐occurrence of terms in the abstract and first claim; and (3) we use term frequency–inverse document frequencyAUTHOR: TF‐IDF has been defined as “term frequency–inverse document frequency”. Please check if correct. or mutual information to rank and select candidate terms. Secondly, we propose a new method to obtain valuable long tail term from patents. To fulfill the purpose, (1) we firstly build long tail term–common term pair as candidate set; (2) then we evaluate each candidate pair's value; and finally, (3) to demonstrate our method, we give an example on our result. This study provides a new perspective in extracting terms from free texts of patent records and also proposes a new method to obtain valuable long term to aid information analysis with massive patent texts. Copyright © 2016 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research