at WCRE 2009 Margaret-Anne Storey and Jean-Luc Hainaut
Margaret-Anne Storey keynote
Beyond the Lone Reverse Engineer: Insourcing, Outsourcing and Crowdsourcing
When one imagines a reverse engineer at work, an image that often
comes to mind is that of a
lone engineer using advanced tools to help in design recovery.
However, in practice the
engineer may be part of a team that has to tackle the arduous task of
documenting a system's
design. Often, such a team will be distributed and may have to work
in an asynchronous
manner. Moreover, sharing and combining knowledge with transient or
non-team members further
adds to the complexity of the task. These collaboration challenges are
seldom discussed or
even mentioned in the research literature.
In this talk, I will explore how models, theories and technologies
from the disciplines of
computer supported cooperative work and social computing can improve
collaboration in reverse engineering. I will briefly present several
success stories on how
social computing technologies have helped improve how small teams,
distributed larger teams
and the crowd tackle complex intellectual tasks in other areas of
science. I will also describe some of our early work investigating
how Web 2.0 social computing technologies, such as tagging and feeds,
facilitate collaborative software engineering. My hope is that these
stories may spark ideas on how social computing might inspire new
research in reverse engineering.
Dr. Margaret-Anne Storey is a professor of computer science at the
University of Victoria, a
Visiting Scientist at the IBM Centre for Advanced Studies in Toronto
and a Canada Research
Chair in Human Computer Interaction for Software Engineering. She is
one of the principal
investigators for CSER (Centre for Software Engineering Research in
Canada) and an
investigator for the National Center for Biomedical Ontology, US. Her
research goal is to
understand how technology can help people explore, understand and
share complex information
and knowledge. She applies and evaluates techniques from knowledge
software and visual interface design to applications such as
development, program comprehension, medical ontology development, and
learning in web-based
Jean-Luc Hainaut keynote
Legacy and Future of Data Reverse Engineering
Data(base) reverse engineering is the process through which the missing technical and/or semantic schemas of a database (or, equivalently, of a set of files) are reconstructed. If carefully performed, this process allows legacy databases to be safely maintained, extended, migrated to modern platforms or merged with other, possibly heterogeneous, databases. Although this process is mostly pertinent for old databases, that are supposed to be poorly documented, it proves highly useful for recent databases as well, in as much as many of them are huge and complex, but poorly designed and insufficiently (if ever) documented.
As compared to standard software reverse engineering, database reverse engineering exhibits some interesting particularities. Firstly, its very goal is to recover the complete specification of a database in such a way that its conversion to another data model could be automated, a ability that is, so far, not achievable for procedural code. Secondly, it makes use on a large variety on information sources, ranging from DDL (data definition language) code analysis to data analysis, program code analysis, program behaviour observation and ontology alignment. Finally, it quickly appears that database reverse engineering requires program understanding techniques, in the same way as serious data intensive program understanding requires database reverse engineering.
Historically, we can identify three periods in DBRE: discovery, deepening and widening. They more or less correspond to the last three decades.
The first period, the eighties, was mainly devoted to solving the problem of migrating CODASYL databases, IMS databases and standard files to relational technology. The techniques were based on automated DDL code interpretation augmented with some trivial heuristics to elicit undeclared constraints such as implicit foreign keys. Unfortunately, this approach proved insufficient to recover the complete database schemas, since it ignored the many implicit data structures and constraints which were implemented in the procedural code and in user interfaces for instance.
The main objectives of the second period were to refine elicitation techniques to recover implicit constructs and to develop more flexible (semi-automated) methodologies to address the problem in all its complexity. In particular, sophisticated tool-based application code analysis and data analysis were designed in order to recover field and record structures, relationships, constraints and is-a hierarchies. In addition, the need for reverse engineering relational databases was admitted.
In the present decade, the scope of data(base) reverse engineering and the supporting techniques are being considerably extended. The increasing consensus on XML as a data model, the view of the web as an infinite database, the expression of data semantics through ontology technologies, the development of model-driven transformational models of engineering processes, the requirement of maintaining data traceability, the high cost of system (schemas + data + programs) migration, the explosion of web databases developed by unqualified developers, the increasing complexity and size of corporate databases, the need for heterogeneous database integration, the inescapable shortage of legacy database technology skills, the use of dynamic SQL in most web information systems (that makes popular program static analysis practically useless), the increasing use of ORM (object-relational mapping) environments that bury the database as a mere transparent persistence service, all are facts and trends that make data reverse engineering both more necessary and more complex by an order of magnitude.
The future of data(base) reverse engineering is tied to its ability to address these challenges and to contribute to their solving. Conversely, the future of information system engineering seems, to a large extend, to be dependent on these solutions.