Premium
Patent Similarity Data and Innovation Metrics
Author(s) -
Whalen Ryan,
Lungeanu Alina,
DeChurch Leslie,
Contractor Noshir
Publication year - 2020
Publication title -
journal of empirical legal studies
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.529
H-Index - 24
eISSN - 1740-1461
pISSN - 1740-1453
DOI - 10.1111/jels.12261
Subject(s) - similarity (geometry) , leverage (statistics) , pairwise comparison , intellectual property , computer science , raw data , data mining , code (set theory) , space (punctuation) , cosine similarity , vector space model , scripting language , data science , information retrieval , econometrics , machine learning , artificial intelligence , mathematics , cluster analysis , image (mathematics) , set (abstract data type) , programming language , operating system
We introduce and describe the Patent Similarity Dataset, comprising vector space model‐based similarity scores for U.S. utility patents. The dataset provides approximately 640 million pre‐calculated similarity scores, as well as the code and computed vectors required to calculate further pairwise similarities. In addition to the raw data, we introduce measures that leverage patent similarity to provide insight into innovation and intellectual property law issues of interest to both scholars and policymakers. Code is provided in accompanying scripts to assist researchers in obtaining the dataset, joining it with other available patent data, and using it in their research.