Source Code Diff Revolution (CSER 2024 Keynote)

Source Code Diff Revolution (Invited talk @ JetBrains Open Reading Club)

Code Evolution Tracking in Commit History

Refactoring Mining

Refactoring Software Clones

The unification and refactoring of software clones is a rather challenging task, especially in the case of Type-2 clones (i.e., structurally/syntactically identical fragments except for variations in identifiers, literals, types, layout and comments) and Type-3 clones (i.e., copied fragments with statements changed, added or removed in addition to variations in identifiers, literals, types, layout and comments).

Our vision is to advance the state-of-the-art in the refactoring of software clones by:

- Visualizing the clone differences in a sophisticated and comprehensive manner
- Optimizing the mapping of clone statements, so that the number of differences is minimized
- Suggesting changes required to make clones refactorable
- Detecting sub-clones within larger clones that can be directly refactored
- Applying the most efficient refactoring strategy

Empirical Study on Refactoring Activity

In this empirical study we examined the refactoring activity in the history of 3 open-source projects, namely JUnit, Apache HTTPCore, and Apache HTTPClient.

In paricular we investigated 5 research questions:

RQ1: Do software developers perform different types of refactoring operations on test code and production code?
RQ2: Which developers are responsible for refactorings?
RQ3: Is there more refactoring activity before major project releases than after?
RQ4: Is refactoring activity on production code preceded by the addition or modification of test code?
RQ5: What is the purpose of the applied refactorings?

Code Smell Visualization

Research and practice have shown that the cost of performing maintenance activities highly depends on the underlying design quality of the software systems. In the past, several techniques have been developed for the detection of design problems as a means to support the improvement of design quality in software systems. However, most of these techniques lack the ability to communicate the detected problems to the developers in a comprehensible and effective way. This is one of the reasons justifying the slow and hesitant adoption of preventive maintenance (i.e., maintenance activities aiming to improve future maintainability) as a practice in the software industry.

Code Smell Explorer provides a system-level view of the design problems present in a software system. The system is visualized in a Package Map allowing the developers to find modules that require immediate refactoring, and also find hidden dependencies between code smells.

Code Smell Analyzer provides a class-level visualization of individual design problems. Each code smell instance is visualized in the form of an enriched UML Class diagram showing the dependencies between the class members involved in the code smell. This visualization allows developers to have a better understanding of the causes and severity of each code smell instance.

Past Projects

In this page you can find past projects in which I was involved as the main researcher during my Master's and Ph.D. studies, as well as during my Postdoctoral research at the University of Alberta.

JDeodorant employs a variety of novel methods and techniques in order to identify code smells and suggest the appropriate refactorings that resolve them.The tool identifies four kinds of bad smells, namely Feature Envy, State Checking, Long Method and God Class.

Feature Envy problems are resolved by applying appropriate Move Method refactorings.
State Checking problems are resolved by applying Replace Conditional with Polymorphism and Replace Type code with State/Strategy refactorings.
Long Method problems are resolved by applying appropriate Extract Method refactorings.
God Class problems are resolved by applying appropriate Extract Class refactorings.

Differencing Object-Oriented Models

The comparison of source code versions can be useful for recovering the development process of the system, recognizing applied refactorings, and inferring high-level patterns in the history of the system. In the past, several domain-specific approaches have been applied to the problem of recovering the design evolution of object-oriented systems. In this work, we applied VTracker, a generic domain-independent tree differencing algorithm, to the problem of extracting the changes between two versions of an object-oriented software system. We evaluated VTracker by executing it over successive versions of JFreeChart, and also compared its accuracy in terms of precision and recall against a state-of-the-art domain-specific differencing algorithm, namely UMLDiff.

WebDiff

WebDiff is a web-based and generic differencing service, designed to support the comparison of various types of software artifacts. To achieve the required level of independence from the specific characteristics of the examined software artifacts, WebDiff employs a generic domain-independent tree differencing algorithm (VTracker) that is able to handle any kind of XML document representing a partially-ordered labeled tree.

Design Pattern Detection

The knowledge of the design pattern instances implemented in a software system provides a better understanding of its overall architecture and the design decisions made during its evolution, facilitates its extension to new requirements through pattern extension mechanisms and improves the communication among its developers through a common vocabulary of design concepts. However, finding the implemented pattern instances in a software system is not a trivial task, since they are usually not documented, they do not follow the standard naming conventions, their implementation may deviate from their standard description and their manual detection is prohibitive for large systems. To overcome all these difficulties, we proposed a technique for the structural detection of design pattern instances that is based on a graph similarity algorithm. The proposed technique is scalable to large systems, robust to pattern deviations, highly accurate and easily extensible to new pattern definitions.

Change Proneness Estimation

Identifying classes that are highly probable to change in the near future is essential for software maintenance, since it allows the prioritization of design quality improvement effort on pieces of code that can potentially cause more intense ripple effects. To this end, we proposed a probabilistic model for estimating the change-proneness of classes and compared its predictive ability with models relying on metrics or historical data through logistic regression analysis.