Abstract
Lambda expressions have been introduced in Java 8 to support functional programming and enable behavior parameterization by passing functions as parameters to methods. The majority of software clones (duplicated code) are known to have behavioral differences (i.e., Type-2 and Type-3 clones). However, to the best of our knowledge, there is no previous work to investigate the utility of Lambda expressions for parameterizing such behavioral differences in clones. In this paper, we propose a technique that examines the applicability of Lambda expressions for the refactoring of clones with behavioral differences. Moreover, we empirically investigate the applicability and characteristics of the Lambda expressions introduced to refactor a large dataset of clones. Our findings show that Lambda expressions enable the refactoring of a significant portion of clones that could not be refactored by any other means.
Experiment data
We have provided the resulting files (input and output Excel files, HTML reports, CSV files and real code fragments) for every project and every tool separately in a .7z file. You can click on the corresponding cell to get the file.
Project | CCFinder | Deckard | CloneDR | NiCad |
---|---|---|---|---|
Apache Ant 1.7.0 | Download | Download | Download | Download |
Columba 1.4 | Download | Download | Download | Download |
EMF 2.4.1 | Download | Download | Download | Download |
JMeter 2.3.2 | Download | Download | Download | Download |
JEdit 4.2 | Download | Download | Download | Download |
JFreeChart 1.0.10 | Download | Download | Download | Download |
JRuby 1.4.0 | Download | Download | Download | Download |
Hibernate 3.3.2 | Download | Download | Download | Download |
SQuirreL SQL 3.0.3 | Download | Download | Download | Download |
* For the following projects we removed the classes corresponding to generated code, because generated code is excluded from analysis in most clone related studies:
|
Testing Results
To evaluate the correctness of our approach we run all the unit tests for 12602 clone pairs in JFreeChart project that were were covered by unit tests, after applying each refactoring. To get code coverage and run unit tests, we used the JaCoCo library. The results can be found in the following table.
Clone Detector | Results |
---|---|
CCFinder | Download |
Decakrd | Download |
CloneDR | Download |
NiCad | Download |
Note: The analyzed source files are the same as the previous tables. |
Tools
To obtain the latest versions of JDeodorant and the headless plug-in, please clone the following GitHub repositories, from the given links:
You can read the instructions to install and run the tool here.
Note: For convenience, we have made available, for every project and every tool,
the launch configuration necessary for running the tools.
Inside each of the given 7z files in the previous table, there is a file with
.launch
extension.
You can import this file in Eclipse from File > Import > Run/Debug > Launch Configurations.
However, you will need to update the absolute paths in the arguments.
R Scripts
You can download the R scripts we developed for the analysis of the experiment data from here.
The scripts and CSV files can be found inside the Scripts
and CSVs
folders, respectively.
If you use R Studio, you can open the R Project by opening the file RProject.Rproj
.
The script files are:
-
load.R
If you source this file, it will the functionload()
which loads all the necessary tables to the environment. By default, this function looks at a directory namedCSVs
in the current working directory and recursuvely loads all the CSV files in that directory. You should provide the folder containing CSV files using the argument for this function. -
descriptive-statistics.R
Contains functions for reporting descriptive statistics about clone pairs and lambda expressions.getGapSizePlot()
Gives info and plots the size of lambda expressions compared to clone fragments.-
getLambdaExpressionsInfo()
Gives various statistics about block gaps, expression differences, and lambda expressions. also plots the figure of the number of lambdas per lambdafied clone pair in the dataset. -
getInterfaceTypeStatistics()
Gives statistics about the type of functional interfaces, and the corresponding plot for that. IfTRUE
(orT
) is passed as argument to this function, it will show the breakdown for Custom Interfaces, relaxing the condition for Thrown Exceptions to be zero. -
getFrequencyOfReturnTypesBlock()
andgetFrequencyOfReturnTypesExpressionGaps()
Print the frequency of return types for block gaps/expression diffs, respectively. getCloneGroupSizeStats()
Prints stats about the size of clone groups.
-
refactorability.R
: Contains functions for reporting refactorability statistics. It contains the following functions:refactorableUsingTemplateMethod()
Prints information about clones refactorable by Template Method Design Pattern.getCloneTypeAndRefactorabilityStats()
Prints information about clones type II and III and their refactorability.getRefactorableUsingLambda()
Prints information about clone pairs refactorable by lambda, and the ones that have compile errors/test failuers.getRefactorableUsingLambdaPerProject()
Prints clone pair refactorability with lambda per project.testVSProductionVSLocation()
Prints clone pair refactorability and w.r.t. location/src type (i.e., test vs production code).
Clone detection tools
In this experiment, we used the results of four popular clone detection tools for the experiments.
CCFinder
Authors | Toshihiro Kamiya, Shinji Kusumoto, and Katsuro Inoue |
Publication | "CCFinder: A Multi-Linguistic Token-based Code Clone Detection System for Large Scale Source Code," IEEE Transactions on Software Engineering, vol. 28, no. 7, pp. 654-670, (2002-7). |
Clone detection approach | Token based |
URL to download | Download |
Version we used | 10.2.7.4 |
All you need to run CCFinder is to download and install CCFinderX, a major version of CCFinder, from the given URL. (There are some documentation for installing CCFinder on Ubuntu Linux here, while we installed and tested it on Windows. Note that, you will need Phyton 2.6 installed on your operating system (we used 2.6.5, Python 2.7 or later versions didn't work). Also, JRE version 5 or later must also be installed (we tried version 7 Update 51).
CCFinderX is a major version of CCFinder which provides a useful GUI for using CCFinder.
After installation, you can run the tool by running /bin/gemx.bat
(on Windows).
From GUI, select File > Detect Clones. Select your desired programming language, add the folders of the source files of the project under investigation, and select Next.
In the next window, you can configure CCFinder for clone detection. As mentioned in the paper, we used these options for our analysis:
Minimum clone length | 50 |
Minimum TKS | 12 |
Shaper level | 2 - Soft shaper |
P-match application | Use P-match |
Prescreening application | Unchecked (Does not affect the clone detection process) |
Deckard
Authors | Lingxiao Jiang, Ghassan Misherghi, Zhendong Su, Stephane Glondu |
Publication | "DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones," The 29th International Conference on Software Engineering, 2007 |
Clone detection approach | Tree based |
URL to download | Download - Alternative URL |
Version we used | 1.3 |
We compiled Deckard in Ubuntu 13.04 X86, using the steps given in the README on Deckard's Github page.
The running instructions are also given in the same page. Note that, you only need to run Deckard using what is described in the first part of the Usage section in the
mentioned page (For clone detection, using file deckard.sh
).
Note: we were not able to run the version of Deckard on Github, it has a problem in the vector generation phase. We used the one given in the alternative URL and it worked.
To configure Deckard, you have to modify the config file in the same folder that deckard.sh
file exists.
You can follow the comments in the config file to make sure that you config it correctly. In this research, we have used the following values for the options in the
condig file:
MIN_TOKENS | 50 |
STRIDE | 0 |
SIMILARITY | 0.95 |
CloneDR
Authors | Ira D. Baxter, Andrew Yahin, Leonardo Moura, Marcelo Sant’Anna and Lorraine Bier |
Publication | “Clone detection using abstract syntax trees,” in Proceedings of the International Conference on Software Maintenance, 1998, pp. 368–377. |
Clone detection approach | AST based |
URL to download | Download |
Version we used | 2.2.12 |
We run CloneDR on Windows 8.1. You should contact Semantic Designs to request an academic license for CloneDR, because the evaluation version reports only 10 clone groups. After installation and registration, run DMS Project Specifier tool, configure it with the following options, select the path of the project you are going to analyze and run the tool.
Similarity threshold | 0.95 |
Max clone parameters | 65,535 |
Min clone mass | 6 |
Characters per node | 16 |
Starting depth | 2 |
Nicad
Authors | Chanchal Kumar Roy, James R Cordy |
Publication | "NICAD: Accurate Detection of Near-Miss Intentional Clones Using Flexible Pretty-Printing and Code Normalization," The 16th IEEE International Conference on Program Comprehension, 2008 |
Clone detection approach | Text based |
URL to download | Download |
Version we used | 3.5 |
We installed Nicad on Ubuntu Linux 14.04 LTS X64. Please note that, to run Nicad, you'll first need to download and install TXL from here.
The instructions found in the readme file in the Nicad's folder is straightforward for installing and running it. Please note that, as its authors
have changed the name of the runnable file to "nicad3", in the "Testing Nicad3" and "Using Nicad3" sections of the readme file, instead of running
nicad functions java...
in the terminal, you should run nicad3 functions java...
.
For configuting Nicad, you have to make a copy of the default.cfg
file in the config
folder in the Nicad's directory, and change the desired
parameters in it.
Below are the parameters we changed from default config file, which were used for running Nicad in this experiment.
Note that, you have to append the name of the config file (without extension) at the end of your command which runs Nicad
(see the mentioned Readme file for the sample commands.)
minsize | 5 |
maxsize | 2500 |
threshold | 0.2 |
rename | consistent |