Abstract

Lambda expressions have been introduced in Java 8 to support functional programming and enable behavior parameterization by passing functions as parameters to methods. The majority of software clones (duplicated code) are known to have behavioral differences (i.e., Type-2 and Type-3 clones). However, to the best of our knowledge, there is no previous work to investigate the utility of Lambda expressions for parameterizing such behavioral differences in clones. In this paper, we propose a technique that examines the applicability of Lambda expressions for the refactoring of clones with behavioral differences. Moreover, we empirically investigate the applicability and characteristics of the Lambda expressions introduced to refactor a large dataset of clones. Our findings show that Lambda expressions enable the refactoring of a significant portion of clones that could not be refactored by any other means.

Experiment data

We have provided the resulting files (input and output Excel files, HTML reports, CSV files and real code fragments) for every project and every tool separately in a .7z file. You can click on the corresponding cell to get the file.

Results
Project CCFinder Deckard CloneDR NiCad
Apache Ant 1.7.0 Download Download Download Download
Columba 1.4 Download Download Download Download
EMF 2.4.1 Download Download Download Download
JMeter 2.3.2 Download Download Download Download
JEdit 4.2 Download Download Download Download
JFreeChart 1.0.10 Download Download Download Download
JRuby 1.4.0 Download Download Download Download
Hibernate 3.3.2 Download Download Download Download
SQuirreL SQL 3.0.3 Download Download Download Download
* For the following projects we removed the classes corresponding to generated code, because generated code is excluded from analysis in most clone related studies:
  • EMF: The entire package org.eclipse.emf.codegen,
  • JEdit: Classes Parser and ParserTokenManager from package bsh,
  • JRuby: Classes DefaultRubyParser, Ruby19YyTables, Ruby19Parser and YyTables from package org.jruby.parser,
  • SQuirreL SQL: Classes Parser and Scanner from package net.sourceforge.squirrel_sql.client.session.parser.kernel.

Testing Results

To evaluate the correctness of our approach we run all the unit tests for 12602 clone pairs in JFreeChart project that were were covered by unit tests, after applying each refactoring. To get code coverage and run unit tests, we used the JaCoCo library. The results can be found in the following table.

Testing results for JFreeChart
Clone Detector Results
CCFinder Download
Decakrd Download
CloneDR Download
NiCad Download
Note: The analyzed source files are the same as the previous tables.

Tools

To obtain the latest versions of JDeodorant and the headless plug-in, please clone the following GitHub repositories, from the given links:

You can read the instructions to install and run the tool here.

Note: For convenience, we have made available, for every project and every tool, the launch configuration necessary for running the tools. Inside each of the given 7z files in the previous table, there is a file with .launch extension. You can import this file in Eclipse from File > Import > Run/Debug > Launch Configurations. However, you will need to update the absolute paths in the arguments.

R Scripts

You can download the R scripts we developed for the analysis of the experiment data from here. The scripts and CSV files can be found inside the Scripts and CSVs folders, respectively. If you use R Studio, you can open the R Project by opening the file RProject.Rproj. The script files are:

  • load.R If you source this file, it will the function load() which loads all the necessary tables to the environment. By default, this function looks at a directory named CSVs in the current working directory and recursuvely loads all the CSV files in that directory. You should provide the folder containing CSV files using the argument for this function.
  • descriptive-statistics.R Contains functions for reporting descriptive statistics about clone pairs and lambda expressions.
    • getGapSizePlot() Gives info and plots the size of lambda expressions compared to clone fragments.
    • getLambdaExpressionsInfo() Gives various statistics about block gaps, expression differences, and lambda expressions. also plots the figure of the number of lambdas per lambdafied clone pair in the dataset.
    • getInterfaceTypeStatistics() Gives statistics about the type of functional interfaces, and the corresponding plot for that. If TRUE (or T) is passed as argument to this function, it will show the breakdown for Custom Interfaces, relaxing the condition for Thrown Exceptions to be zero.
    • getFrequencyOfReturnTypesBlock() and getFrequencyOfReturnTypesExpressionGaps() Print the frequency of return types for block gaps/expression diffs, respectively.
    • getCloneGroupSizeStats() Prints stats about the size of clone groups.
  • refactorability.R: Contains functions for reporting refactorability statistics. It contains the following functions:
    • refactorableUsingTemplateMethod() Prints information about clones refactorable by Template Method Design Pattern.
    • getCloneTypeAndRefactorabilityStats() Prints information about clones type II and III and their refactorability.
    • getRefactorableUsingLambda() Prints information about clone pairs refactorable by lambda, and the ones that have compile errors/test failuers.
    • getRefactorableUsingLambdaPerProject() Prints clone pair refactorability with lambda per project.
    • testVSProductionVSLocation() Prints clone pair refactorability and w.r.t. location/src type (i.e., test vs production code).

Clone detection tools

In this experiment, we used the results of four popular clone detection tools for the experiments.

CCFinder

Authors Toshihiro Kamiya, Shinji Kusumoto, and Katsuro Inoue
Publication "CCFinder: A Multi-Linguistic Token-based Code Clone Detection System for Large Scale Source Code," IEEE Transactions on Software Engineering, vol. 28, no. 7, pp. 654-670, (2002-7).
Clone detection approach Token based
URL to download Download
Version we used 10.2.7.4

All you need to run CCFinder is to download and install CCFinderX, a major version of CCFinder, from the given URL. (There are some documentation for installing CCFinder on Ubuntu Linux here, while we installed and tested it on Windows. Note that, you will need Phyton 2.6 installed on your operating system (we used 2.6.5, Python 2.7 or later versions didn't work). Also, JRE version 5 or later must also be installed (we tried version 7 Update 51).

CCFinderX is a major version of CCFinder which provides a useful GUI for using CCFinder. After installation, you can run the tool by running /bin/gemx.bat (on Windows). From GUI, select File > Detect Clones. Select your desired programming language, add the folders of the source files of the project under investigation, and select Next. In the next window, you can configure CCFinder for clone detection. As mentioned in the paper, we used these options for our analysis:

Minimum clone length 50
Minimum TKS 12
Shaper level 2 - Soft shaper
P-match application Use P-match
Prescreening application Unchecked (Does not affect the clone detection process)

Deckard

Authors Lingxiao Jiang, Ghassan Misherghi, Zhendong Su, Stephane Glondu
Publication "DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones," The 29th International Conference on Software Engineering, 2007
Clone detection approach Tree based
URL to download Download - Alternative URL
Version we used 1.3

We compiled Deckard in Ubuntu 13.04 X86, using the steps given in the README on Deckard's Github page. The running instructions are also given in the same page. Note that, you only need to run Deckard using what is described in the first part of the Usage section in the mentioned page (For clone detection, using file deckard.sh).

Note: we were not able to run the version of Deckard on Github, it has a problem in the vector generation phase. We used the one given in the alternative URL and it worked.

To configure Deckard, you have to modify the config file in the same folder that deckard.sh file exists. You can follow the comments in the config file to make sure that you config it correctly. In this research, we have used the following values for the options in the condig file:

MIN_TOKENS 50
STRIDE 0
SIMILARITY 0.95

CloneDR

Authors Ira D. Baxter, Andrew Yahin, Leonardo Moura, Marcelo Sant’Anna and Lorraine Bier
Publication “Clone detection using abstract syntax trees,” in Proceedings of the International Conference on Software Maintenance, 1998, pp. 368–377.
Clone detection approach AST based
URL to download Download
Version we used 2.2.12

We run CloneDR on Windows 8.1. You should contact Semantic Designs to request an academic license for CloneDR, because the evaluation version reports only 10 clone groups. After installation and registration, run DMS Project Specifier tool, configure it with the following options, select the path of the project you are going to analyze and run the tool.

Similarity threshold 0.95
Max clone parameters 65,535
Min clone mass 6
Characters per node 16
Starting depth 2

Nicad

Authors Chanchal Kumar Roy, James R Cordy
Publication "NICAD: Accurate Detection of Near-Miss Intentional Clones Using Flexible Pretty-Printing and Code Normalization," The 16th IEEE International Conference on Program Comprehension, 2008
Clone detection approach Text based
URL to download Download
Version we used 3.5

We installed Nicad on Ubuntu Linux 14.04 LTS X64. Please note that, to run Nicad, you'll first need to download and install TXL from here.

The instructions found in the readme file in the Nicad's folder is straightforward for installing and running it. Please note that, as its authors have changed the name of the runnable file to "nicad3", in the "Testing Nicad3" and "Using Nicad3" sections of the readme file, instead of running nicad functions java... in the terminal, you should run nicad3 functions java....

For configuting Nicad, you have to make a copy of the default.cfg file in the config folder in the Nicad's directory, and change the desired parameters in it. Below are the parameters we changed from default config file, which were used for running Nicad in this experiment. Note that, you have to append the name of the config file (without extension) at the end of your command which runs Nicad (see the mentioned Readme file for the sample commands.)

minsize 5
maxsize 2500
threshold 0.2
rename consistent