Research Topics
Our research group focuses on the area of software maintenance, and particularly on design quality improvement and design evolution analysis.
We currently offer several interesting projects (see below) for Master's and PhD students. If you are interested in these projects or have your own ideas for projects in the
aforementioned research areas, please contact me at tsantalis [at] cse.concordia.ca. Don't forget to attach your CV and provide details about previous research projects you have worked on, and software projects you have developed.
Candidates are expected to meet the following criteria:
- Strong programming skills (especially in Java)
- Experience in Eclipse plug-in development (experience in Apache Hadoop is a plus)
- Experience in writing research papers and technical reports
It has been observed that the design quality of a software system deteriorates throughout its evolution (software aging) due to changes in the requirements that were not anticipated in the original design or poor design decisions for the implementation of new requirements caused by the pressure to meet deadlines.
Design quality deterioration manifests itself in the form of design defects or flaws that make the system harder to comprehend, test, extend and maintain in general.
Refactorings are source code transformations that do not alter the external behavior of a program, but improve its internal structure [1]. The cumulative effect of these code transformations can radically improve the design quality of a system and reverse software decay.
Despite the wide support of refactoring application mechanics in modern IDEs, the refactoring process is not supported in its entirety, since the developers have to manually detect refactoring opportunities and assess their impact on design quality.
Over the last years, several approaches have been proposed in the literature for the detection of specific refactoring opportunities in a systematic manner [2] - [8]. However, there is still a large list of refactorings and design problem resolution strategies [1] [9] to be explored.
Have a look at JDeodorant, an Eclipse plug-in that supports the detection of refactoring opportunities in Java projects and has been used by several companies and organizations to improve the design quality of their software products.
JDeodorant provides a powerful infrastructure for the analysis of source code allowing the implementation of detectors for a large variety of refactoring opportunities.
Working on this project will give you experience in source code analysis techniques and major Eclipse frameworks, such as JDT (Java Development Tools) and LTK (Refactoring Language Toolkit), as well as the satisfaction to see that your contribution is used by hundreds of developers around the world. JDeodorant is available through Eclipse Marketplace and has already a large install base.
Another interesting research direction is the investigation of refactoring opportunities in dynamic languages, such as Ruby [10],
and style sheet languages, such as Cascading Style Sheets (CSS) [11] - [13].
References
- M. Fowler, K. Beck, J. Brant, W. Opdyke, and D. Roberts, Refactoring: Improving the Design of Existing Code, Addison Wesley, Boston, MA, 1999.
- N. Tsantalis, and A. Chatzigeorgiou, "Identification of Move Method Refactoring Opportunities," IEEE Transactions on Software Engineering, vol. 35, no. 3, pp. 347-367, May/June 2009.
- N. Tsantalis, and A. Chatzigeorgiou, "Identification of Refactoring Opportunities Introducing Polymorphism," Journal of Systems and Software, vol. 83, no. 3, pp. 391-404, March 2010.
- N. Tsantalis, and A. Chatzigeorgiou, "Identification of Extract Method Refactoring Opportunities for the Decomposition of Methods," Journal of Systems and Software, vol. 84, no. 10, pp. 1757-1782, October 2011.
- G. Bavota, A. De Lucia, and R. Oliveto, "Identifying Extract Class Refactoring Opportunities Using Structural and Semantic Cohesion Measures," Journal of Systems and Software, vol. 84, no. 3, pp. 397-414, March 2011.
- H. Liu, Z. Niu, Z. Ma, and W. Shao, "Identification of Generalization Refactoring Opportunities," Automated Software Engineering, 2012.
- K. Hotta, Y. Higo, and S. Kusumoto, "Identifying, Tailoring, and Suggesting Form Template Method Refactoring Opportunities with Program Dependence Graph," pp. 53-62, 16th European Conference on Software Maintenance and Reengineering (CSMR'12), Szeged, Hungary, March 27-30, 2012.
- Y. Higo, S. Kusumoto, and K. Inoue, "A metric-based approach to identifying refactoring opportunities for merging code clones in a Java software system," Journal of Software Maintenance & Evolution, vol. 20, no. 6, pp. 435-461, November 2008.
- W. J. Brown, R. C. Malveau, H.W. McCormick III, and T. J. Mowbray, AntiPatterns: Refactoring Software, Architectures, and Projects in Crisis, John Wiley & Sons, 1998.
- W. C. Wake, and K. Rutherford, Refactoring in Ruby, Addison-Wesley, 2010.
- A. Mesbah, and S. Mirshokraie, "Automated Analysis of CSS Rules to Support Style Maintenance," pp. 408–418, 34th International Conference on Software Engineering (ICSE'12), Zurich, Switzerland, June 2-9 2012.
- M. Keller and M. Nussbaumer, "CSS Code Quality: A Metric for Abstractness; Or Why Humans Beat Machines in CSS Coding," pp. 116–121, Seventh International Conference on the Quality of Information and Communications Technology (QUATIC'10), Oporto, Portugal, September 29-October 2, 2010.
- A. Adewumi, S. Misra, and N. Ikhu-Omoregbe, "Complexity Metrics for Cascading Style Sheets," pp. 248-257, 12th international conference on Computational Science and Its Applications - Volume Part IV (ICCSA'12), Salvador de Bahia, Brazil, June 18-21, 2012.
Maintenance activity is generally divided into distinct categories:
- Corrective: Fixing software defects/bugs.
- Perfective: Enhancing performance and usability.
- Adaptive: Adapting to new technologies, environments and language features.
- Preventive: Improving future maintainability.
- Feature addition: Implementing new requirements.
- Non functional: Updating documentation, copyright, license, code format/indentation.
The research problem being investigated is the automatic classification of source code changes between successive revisions into the aforementioned maintenance categories using advanced source code analysis and differencing techniques.
The classification of source code changes can provide insight in the intention/purpose of the maintenance activities. Furthermore, it can be used to detect development phases throughout the evolution of a project as a means to a) support project management/awareness, b) reveal the development processes and practices being applied by the development team, and c) find patterns of interleaved maintenance activity types. Finally, it can also be used to detect software modules which are stable, error-prone, critical with respect to extension, or frequently refactored.
Previous research work, has mainly focused on commit information (i.e., commit message, author and modified modules) in order to perform this classification [1] [2]. This approach has two main disadvantages: a) it depends on the quality and clarity of the comments provided by the developers, and b) it classifies each revision into a single maintenance category, although the changes may correspond to multiple categories.
Other research works employ program differencing in order to extract the system’s evolution profile as a sequence of change trees [3],
detect non-essential changes (such as the replacement of simple types with qualified ones, extraction of local variables, keyword modifications and local variable renames) through sophisticated source code analysis techniques (i.e., Partial Program Analysis) [4], detect the introduction of Java Generics [5], and
automatically discover and summarize systematic code changes (such as refactorings, feature additions, and updates to code clones) as logic rules [6]. However, these approaches do not cover completely all maintenance types.
This research problem is ideal for students with background or interest in source code analysis and differencing, as well as text mining techniques.
References
- Abram J. Hindle, Daniel M. German, Michael W. Godfrey, and Richard C. Holt, "Automatic Classification of Large Changes into Maintenance Categories," In Proceedings of the 2009 IEEE Intl. Conference on Program Comprehension (ICPC-09), 2009.
- Ahmed E. Hassan, "Automated classification of change messages in open source projects," In Proceedings of the 2008 ACM symposium on Applied computing (SAC '08). ACM, New York, NY, USA, 837-841.
- Zhenchang Xing, and Eleni Stroulia, "Understanding Phases and Styles of Object-Oriented Systems' Evolution," In Proceedings of the 20th IEEE International Conference on Software Maintenance (ICSM '04). IEEE Computer Society, Washington, DC, USA, 242-251.
- David Kawrykow, and Martin P. Robillard, "Non-essential changes in version histories," In Proceedings of the 33rd International Conference on Software Engineering (ICSE '11), pp. 351-360, 2011.
- Chris Parnin, Christian Bird, and Emerson Murphy-Hill, "Java Generics Adoption: How New Features are Introduced, Championed, or Ignored," 8th Working Conference on Mining Software Repositories (MSR '11), pp. 3-12, 2011.
- Miryung Kim, David Notkin, Dan Grossman, and Gary Wilson Jr., "Identifying and Summarizing Systematic Code Changes via Rule Inference," IEEE Transactions on Software Engineering, 02 March 2012.
Software systems may present a very large number and variety of refactoring opportunities. As a result, the existence of conflicts and dependencies in the application of specific refactorings is quite possible.
A conflict usually takes place when two or more refactoring opportunities affect a common piece of code. In this case, the refactorings should be applied according to the scope of the code that they affect and the semantics of the involved refactoring types.
For example, let us assume that there are two refactoring opportunities for method m. The first opportunity suggests the move of method m from class A to class B (because it accesses field and/or methods from class B), while the second one suggests the extraction of a code fragment from the body of method m into a separate method. By applying the Extract Method refactoring first, method m becomes more dependent to class A (since after the application of the refactoring method m will access the extracted method in class A as well) and thus the Move Method refactoring opportunity may become weaker (i.e., less effective) or even disappear. On the other hand, if the Move Method refactoring is applied first, then the Extract method refactoring can be applied afterwards without any problem.
A dependency takes place when two refactoring opportunities affect different pieces of code that are dependent with each other. In this case, the order of refactoring application depends on the direction of the dependency.
For example, let us assume that there are two Move Method refactoring opportunities for methods x and y, respectively, and method x calls method y. The refactoring corresponding to method y should be applied first, since the move of the called method (i.e., method y) may affect the target class for the calling method (i.e., method x).
Some previous research approaches to this problem, proposed the representation of refactorings as graph transformations [1] [4], where the detection of conflicts and dependencies is based on critical pair analysis and sequential dependency analysis. Other approaches treat the problem of finding an optimal sequence of refactoring applications as a search problem on Deterministic Finite Automata [3], where nodes represent system states and edges represent the application of specific refactorings. Finally, there is a category of approaches that treat this problem as a scheduling problem [2] [5], where the goal is to optimize certain functions (e.g., design quality, understandability, maintainability, refactoring effort) with respect to the constraints imposed by refactoring conflicts and dependencies.
This research problem is ideal for students with background or interest in search and optimization techniques.
References
- Tom Mens, Gabriele Taentzer, Olga Runge, "Analysing Refactoring Dependencies Using Graph Transformation," Software and Systems Modeling, vol. 6, no. 3, pp. 269-285, September 2007.
- H. Liu, G. Li, Z. Y. Ma, and W. Z. Shao, "Conflict-aware schedule of software refactorings," Software IET, vol. 2, no. 5, pp. 446-460, 2008.
- Eduardo Piveta, João Araújo, Marcelo Pimenta, Ana Moreira, Pedro Guerreiro, and R. Tom Price, "Searching for Opportunities of Refactoring Sequences: Reducing the Search Space," In Proceedings of the 2008 32nd Annual IEEE International Computer Software and Applications Conference (COMPSAC '08), pp. 319-326, 2008.
- Fawad Qayum and Reiko Heckel, "Analysing refactoring dependencies using unfolding of graph transformation systems," In Proceedings of the 7th International Conference on Frontiers of Information Technology (FIT '09), 2009.
- Minhaz F. Zibran and Chanchal K. Roy, "Conflict-Aware Optimal Scheduling of Code Clone Refactoring: A Constraint Programming Approach," In Proceedings of the 2011 IEEE 19th International Conference on Program Comprehension (ICPC '11), pp. 266-269, 2011.
The goal of empirical studies is to investigate the processes and practices being applied in software projects and development teams as a means to propose, validate and improve models and analytical tools.
Within the context of refactoring activity, prior empirical studies investigated its relation with the number and duration of bug fixes [5], software release dates [5], testing periods [6], as well as the impact of refactorings on software metrics [7] [8]. More recent empirical studies investigated the evolution and lifespan of code smells in the history of software projects [1] [2] [4], as well as the impact of code smells on software change-proneness [3].
However, the relationship between code smells and refactoring activity has not been investigated yet. It is particularly interesting to examine the specific reasons that motivate the developers to apply refactorings and how refactoring activity is interleaved with other maintenance activities, such as bug fixing, addition of features, unit testing and design improvement.
This research problem is ideal for students with background or interest in mining software repositories (MSR) and statistical analysis.
References
- A. Chatzigeorgiou and A. Manakos, "Investigating the Evolution of Bad Smells in Object-Oriented Code," 7th International Conference on the Quality of Information and Communications Technology (QUATIC'2010), Porto, Portugal, September 29-October 2, 2010.
- R. Peters, and A. Zaidman, "Evaluating the Lifespan of Code Smells using Software Repository Mining," 16th European Conference on Software Maintenance and Reengineering (CSMR'12), Szeged, Hungary, March 27-30, 2012.
- F. Khomh, M. Di Penta and Y.-G. Guéhéneuc, "An Exploratory Study of the Impact of Code Smells on Software Change-proneness," 16th Working Conference on Reverse Engineering (WCRE'09), Lille, France, October 2009, pp. 75-84.
- S. Olbrich, D. S. Cruzes, V. Basili and N. Zazworka, "The Evolution and Impact of Code Smells: A Case Study of Two Open Source Systems", 3rd International Symposium on Empirical Software Engineering and Measurement (ESEM'09), Florida, USA, October 2009, pp. 390-400.
- M. Kim, D. Cai, and S. Kim, "An empirical investigation into the role of API-level refactorings during software evolution," In Proceedings of the 33rd International Conference on Software Engineering, pp. 151–160, 2011.
- N. Rachatasumrit, and M. Kim, "An Empirical Investigation into the Impact of Refactoring on Regression Testing," In Proceedings of the 28th IEEE International Conference on Software Maintenance (ICSM '12), 2012.
- K. Stroggylos, and D. Spinellis, "Refactoring--Does It Improve Software Quality?," In Proceedings of the 5th International Workshop on Software Quality (WoSQ'2007), 2007.
- M. Alshayeb, "Empirical investigation of refactoring effect on software quality," Information and Software Technology, vol. 51, no. 9, pp. 1319-1326, September 2009.
Source code repositories are becoming larger and larger, both with respect to their codebase and their evolution history. As a result, the analysis of software repositories has become a computation- and data-intensive task that requires parallel and distributed processing.
The goal of this research project is to investigate the use of frameworks supporting distributed applications [1] [2], such as Apache Hadoop, for the distributed analysis of source code.
This research problem is ideal for students with background or interest in parallel algorithms, distributed computing, algorithm optimization, and scalability studies.
References
- Weiyi Shang, Zhen Ming Jiang, Bram Adams, and Ahmed E. Hassan, "MapReduce as a general framework to support research in Mining Software Repositories (MSR)," In Proceedings of the 2009 6th IEEE International Working Conference on Mining Software Repositories (MSR '09), pp. 21-30, 2009.
- Weiyi Shang, Zhen Ming Jiang, Hadi Hemmati, Bram Adams, Ahmed E. Hassan, and Patrick Martin, "Assisting developers of big data analytics applications when deploying on hadoop clouds," In Proceedings of the 35th International Conference on Software Engineering (ICSE '13), pp. 402-411, 2013.