z-logo
open-access-imgOpen Access
The perils and pitfalls of mining SourceForge
Author(s) -
James Howison
Publication year - 2004
Publication title -
surface: the syracuse university research facility and collaborative environment (syracuse university)
Language(s) - English
Resource type - Conference proceedings
DOI - 10.1049/ic:20040467
Subject(s) - computer science , software , data science , parsing , domain (mathematical analysis) , range (aeronautics) , software engineering , data mining , world wide web , artificial intelligence , engineering , programming language , mathematics , mathematical analysis , aerospace engineering
SourceForge provides abundant accessible data from Open Source Software development projects, making it an attractive data source for software engineering research. However it is not without theoretical peril and practical pitfalls. In this paper, we outline practical lessons gained from our spidering, parsing and analysis of SourceForge data. SourceForge can be practically difficult: projects are defunct, data from earlier systems has been dumped in and crucial data is hosted outside SourceForge, dirtying the retrieved data. These practical issues play directly into analysis: decisions made in screening projects can reduce the range of variables, skewing data and biasing correlations. SourceForge is theoretically perilous: because it provides easily accessible data items for each project, tempting researchers to fit their theories to these limited data. Worse, few are plausible dependent variables. Studies are thus likely to test the same hypotheses even if they start from different theoretical bases. To avoid these problems, analyses of SourceForge projects should go beyond project level variables and carefully consider which variables are used for screening projects and which for testing hypotheses.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom