Premium
Investigating diversity and impact of the popularity metrics for ranking software packages
Author(s) -
Saini Munish,
Verma Rohan,
Singh Antarpuneet,
Chahal Kuljit Kaur
Publication year - 2020
Publication title -
journal of software: evolution and process
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.371
H-Index - 29
eISSN - 2047-7481
pISSN - 2047-7473
DOI - 10.1002/smr.2265
Subject(s) - popularity , computer science , ranking (information retrieval) , set (abstract data type) , context (archaeology) , software , data mining , data science , information retrieval , psychology , social psychology , paleontology , biology , programming language
Context Community‐based collaborative approach in open source software paradigm promotes reuse of existing software packages. There are several repositories (e.g., npm) for packages and have their own set of metrics for ranking. Objective This study explores the diversity of different popularity metrics and also the relationship between popularity metrics and development activity of the packages. Another aim is to create a package popularity index by aggregating a set of noncollinear popularity metrics. Method Using 195 K packages from different repositories, we investigated the correlation between different popularity metrics. K ‐medoids algorithm helped to identify packages with different levels of popularity. Random forests method is utilized to create the package popularity index. Lastly, we used scikit‐learn implementation for determining feature importance in the model. Results Popularity metrics of the Github platform are very strongly correlated ( R ≥ 0.85) for highly popular packages. Popular packages have high‐development activity. However, the number of downloads of a package does not associate with development activity. Not all the metrics are important for determining popularity of a software package. Conclusion This study provides practical guidelines to understand important metrics to determine the popularity of software packages. Researchers should focus on non‐collinear metrics, thereby avoiding similar metrics while aggregating for building models.