Automatic mining of functionally equivalent code fragments via random testing
Author(s) -
Lingxiao Jiang,
Zhendong Su
Publication year - 2009
Publication title -
singapore management university institutional knowledge (ink) (singapore management university)
Language(s) - English
Resource type - Conference proceedings
DOI - 10.1145/1572272.1572283
Subject(s) - computer science , equivalence (formal languages) , programming language , semantic equivalence , copying , code (set theory) , theoretical computer science , artificial intelligence , mathematics , semantic computing , set (abstract data type) , discrete mathematics , semantic web , political science , law
Similar code may exist in large software projects due to some common software engineering practices, such as copying and pasting code and n-version programming. Although previous work has studied syntactic equivalence and small-scale, coarse-grained program-level and function-level semantic equivalence, it is not known whether significant fine-grained, code-level semantic duplications exist. Detecting such semantic equivalence is also desirable because it can enable many applications such as code understanding, maintenance, and optimization. In this paper, we introduce the first algorithm to automatically mine functionally equivalent code fragments of arbitrary size - down to an executable statement. Our notion of functional equivalence is based on input and output behavior. Inspired by Schwartz's randomized polynomial identity testing, we develop our core algorithm using automated random testing: (1) candidate code fragments are automatically extracted from the input program; and (2) random inputs are generated to partition the code fragments based on their output values on the generated inputs. We implemented the algorithm and conducted a large-scale empirical evaluation of it on the Linux kernel 2.6.24. Our results show that there exist many functionally equivalent code fragments that are syntactically different (i.e., they are unlikely due to copying and pasting code). The algorithm also scales to million-line programs; it was able to analyze the Linux kernel with several days of parallel processing.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom