Research Statement
I envisage carrying out research activities on problems relevant to
Search-Based Software Engineering (SBSE). I believe that, in terms of
costs, performance and new insights, there are benefits in reformulating
many software engineering problems as optimization problems. Consistent
with my Ph.D. work, I expect my contributions to be primarily in SBSE,
and in particular the analysis of software evolution to improve software
quality.
Current Research Activity
My Ph.D. work is mainly on software evolution and investigates the
usefulness of graph matching techniques as efficient ways of retrieving
the evolution through time of software artifacts. It involves three
distinct but related aspects. First, in order to match artifacts
efficiently, I proposed and implemented a tabu search technique for
approximate graph matching problems. Second, I modeled software
artifacts, such as class diagrams, as directed graphs and applied the
developed algorithm to match subsequent diagrams. Third, based on the
obtained matchings, I proposed new design evolution metrics and assessed
their usefulness in defect prediction models. The following paragraphs
detail each of those three aspects.
Approximate Graph Matching
Many practical problems can be modeled as approximate graph matching
problems in which the goal is to find a “good” matching between two
objects represented as graphs. Existing literature is centered on image
processing and biochemistry communities with very specific graph
matching problems, benchmarks, and algorithms. As a result, researchers
and practitioners from other fields, such as software engineering, may
not find tools or algorithms suitable for their own specific graph
matching problem. Moreover, given
that most graph instances in image processing and biochemistry
literature are very small (less than 100 nodes), the proposed algorithms
have scalability issues.
To address this situation, I tackled in a generic way, the approximate
graph matching problem. My approach avoids as much as possible
assumptions about graphs to be matched and tries to make the best out of
basic graph features such as node connectivity and edge types.
Consequently, my proposal is a local search technique using new node
similarity measures derived from simple structural information.
The proposed technique was devised as follows. First, I observed and
empirically validated that initializing a local search
with a very small
subset of “correct” node matches is enough to get excellent results [3].
Instead of directly trying to correctly match all nodes and edges of a
given graph to the nodes and edges of another graph, one could focus on
correctly matching a reduced subset of nodes. Second, in order to
retrieve such subsets, I resorted to the concept of local node
similarity. My approach consists in assessing, by analyzing their
neighborhoods, how likely it is to have a pair of nodes included in a
good matching. I investigated many ways of computing similarity values
between pairs of nodes and proposed additional techniques to attach a
level of confidence to any computed similarity value. My work results in
a similarity enhanced tabu algorithm [5, 0] which advances the
state-of-art.
Software Evolution through the Matching of High Level artifacts
Long-lived Object Oriented (OO) systems often possess a rich history of
changes and design decisions that should be used to gain actionable
knowledge. However, given the size and complexity of OO systems,
retrieving and understanding the history of the design evolution is a
difficult task which requires appropriate techniques.
In my approach, a design representation, usually the class diagram, is
first recovered from the code and modeled as a directed graph; then my
graph matching algorithm is applied to gain insight on its evolution
through time. My key difference is that, unlike the other approaches
which are solely dedicated to a specific representation, my generic
graph matching algorithm can be adapted to study
different artifacts - provided that they can be represented as
directed graphs - and different
problems (by changing the optimization criterion). I tested my
approach on the evolution of class diagrams [2, 6, 7] and empirically
proved it to be both efficient and scalable. Ongoing work involves other
artifacts such as state and activity diagrams.
Design Evolution Metrics for Defect Prediction
Testing is the most widely adopted practice to ensure software quality.
However, this activity is often a compromise between the available
resources and sought software quality. In object-oriented development,
testing effort should be focused on defect-prone classes or
alternatively on classes deemed critical based on criteria such as their
connectivity or evolution profile [10, 11]. Unfortunately, the
identification of defect-prone classes is a challenging and difficult
activity on which many metrics, techniques, and models have been tried
with mixed success.
Following the retrieval of class diagrams’ evolution by my graph
matching approach, I proposed and investigated the usefulness of
elementary design evolution metrics in the identification of defective
classes [1]. The metrics include the numbers of added, deleted, and
modified attributes, methods, and relations. They are used to recommend
a ranked list of classes likely to contain defects for a system. I
evaluated the efficiency of our approach according to three criteria:
presence of defects, number of defects, and defect density in the
top-ranked classes. I conducted experiments with small to large systems
and made comparisons against known metrics (31 complexity metrics and
the Chidamber and Kemerer’s metrics). Results show that the design
evolution metrics, when used in conjunction with known metrics, improve
the identification of defective classes. In addition, they provide
evidence that design evolution metrics make significantly better
predictions of defect density than other metrics and, thus, can help in
reducing the testing effort by focusing test activity on a reduced
volume of code [1].
My Research Plan
My research plan is in line with the work I’m pursuing for my Ph.D. and
will be centered on software evolution, software quality, and
search-based software engineering. In order to broaden the range of
problems I will be able to tackle, I foresee extensions to the graph
matching tool I have been developing. The current version of my
algorithm abides to the one-to-one constraint (one node is matched to at
most one node) which is enough for most matching tasks but may fall
short in cases where nodes are expected to split or merge. To address
that issue, I’m considering the addition of new and more complex graph
edit operations able to deal with cases where a group of nodes can be
matched with another group of nodes (many-to-many matching).
I also intend to conduct experiments destined to retrieve the evolution
of different software artifacts (such as sequence diagrams or state
machines) and investigate how the extracted information can help
maintainers and managers. Moreover, I’m interested in matching artifacts
of different kinds and believe that extensions to my graph matching
approach have the potential to bring significant contributions in
software traceability.
Additionally, the investigation of metrics, techniques and models which
can be useful in the prediction of software quality is definitely part
of my agenda. For instance, I would like to pursue my work on Design
Evolution Metrics [1] and contribute to the efforts in identifying
“micro-architectures” with desirable or unwanted properties [4]. Along
related lines, software testing is a research area I am willing to
explore, given my teaching experience in the field. Problems such as
automatic test generation and
optimal integration test
orders look particularly interesting to me.
Finally, I believe that my advanced knowledge of meta-heuristics and
graph theory could result in fruitful collaborations with future
colleagues and contribute to the outreach of my institution.
Publications
Under Review
0.
Kpodjedo, S.,
Galinier, P., Antoniol, G., “Using Local Similarity Measures to
Efficiently Address Approximate Graph Matching”, submitted to
Discrete Applied Mathematics,
37 pages, October 2010
Journal papers
1.
Kpodjedo, S.,
Ricca, F., Galinier, P., Antoniol, G., Gueheneuc, Y.-G., “Design
Evolution Metrics for Defect Prediction in Object Oriented Systems”,
Empirical Software Engineering
Vol.
16, 1 pp 141-175 (2011)
2.
Kpodjedo, S.,
Ricca, F., Galinier, P., Antoniol, G., Gueheneuc, Y-G., “Studying
Software Evolution of Large Object-Oriented Software Systems Using an
ETGM Algorithm”, Journal of
Software Maintenance and Evolution,
http://dx.doi.org/10.1002/smr.519, 32 pages, September 2010
3.
Kpodjedo, S.,
Galinier, P., Antoniol, G., “On
the Use of Similarity Metrics for Approximate Graph Matching”,
Electronic
Notes in Discrete Mathematics Vol. 36, 1 pp. 687-694,
August 2010.
Peer-reviewed
Conference papers
4.
Belderrar,
A., Kpodjedo, S., Guéhéneuc, Y.-G., Antoniol, G., Galinier, P.,
“Sub-Graph Mining: Identifying Micro-Architectures in Evolving Object
Oriented Softwares”, Proceedings of the 15th
to appear in Eur. Conf. on Software
Maintenance and Reengineering (CSMR2011), March 1-4, 2011, Oldenburg,
Germany.
5.
Kpodjedo, S.,
Galinier, P., Antoniol, G., “Enhancing a Tabu Algorithm for Approximate
Graph Matching by Using Similarity Measures”, Proceedings of the 10th
Eur. Conf. on Evolutionary Computation in Combinatorial Optimization
(EvoCOP2010), April 7–9 2010, Istanbul, Turkey, pp. 119-130.
6.
Kpodjedo, S.,
Ricca, F., Antoniol, G., Galinier, P., “Recovering the Evolution Stable
Part Using an ECGM Algorithm: Is There a Tunnel in Mozilla?”,
Proceedings of the 13th Eur. Conf. on Software Maintenance
and Reengineering (CSMR2009), March 24-27, 2009,
Fraunhofer
IESE,
Kaiserslautern,
Germany, pp.
179-188.
7.
Kpodjedo, S.,
Ricca, F., Antoniol, G., Galinier, P., “Error Correcting Graph Matching
Application to Software Evolution”, Proceedings of the 15th
Working Conference on Reverse Engineering (WCRE2008),
October 15-18, 2008,
Antwerp, Belgium, pp.
289-293.
8.
Kpodjedo, S.,
Pourzandi, M., Pierre, S., “Reputation
Based Trust Management Using TCG in Mobile Ad-Hoc Networks (RTA)”,
Proceedings of the 33rd IEEE Conference on Local
Computer Networks (LCN2008), October 14-17, Montreal, Quebec, Canada,
pp. 518-519
Workshop papers
9.
Kpodjedo, S.,
“Approximate Graph Matching in Software Engineering”, Proceedings
of the 16th
Working Conference on Reverse Engineering (WCRE2009), October 13-16,
2008, Lille, France, Doctoral Symposium,
pp. 295-298
10.
Kpodjedo, S.,
Ricca, F., Antoniol, G., Galinier, P., “Evolution and Search Based
Metrics to Improve Defects Prediction”, Proceedings of the
1st International
Symposium on Search Based Software Engineering
(SSBSE2009), May 13-15, 2009,
Cumberland Lodge, Windsor, UK,
pp. 23 - 32.
11.
Kpodjedo, S.,
Ricca, F., Antoniol, G., Galinier, P., “Not all classes are created
equal: toward a recommendation system for focusing testing”, Proceedings
of the 1st International Workshop on Recommendation Systems
for Software Engineering (RSSE2008) co-located with ACM SIGSOFT 2008 /
FSE 16, November 10, 2008, Atlanta, Georgia, USA, pp. 6-10