DMS Software Reengineering Toolkit

From Wikipedia, the free encyclopedia
Jump to: navigation, search
DMS Software Reengineering Toolkit
Developer(s) Semantic Designs

The DMS Software Reengineering Toolkit[1] is a proprietary set of program transformation tools available for automating custom source program analysis, modification, translation or generation of software systems for arbitrary mixtures of source languages for large scale software systems.

DMS has been used to implement a wide variety of practical tools, include domain-specific languages (such as code generation for factory control), test coverage[2] and profiling tools, clone detection,[3] language migration tools, C++ component reengineering.,[4] and for research into difficult topics such as refactoring C++ reliably.[5]

The toolkit provides means for defining language grammars and will produce parsers which automatically construct abstract syntax trees (ASTs), and prettyprinters to convert original or modified ASTs back into compilable source text. The parse trees capture, and the prettyprinters regenerate, complete detail about the original source program, including source position, comments, radix and format of numbers, etc., to ensure that regenerated source text is as recognizable to a programmer as the original text modulo any applied transformations.

Many program analysis and transformation tools are limited to ASCII or Western European character sets such as ISO-8859; DMS can handle these as well as UTF-8, UTF-16, EBCDIC, Shift-JIS and a variety of Microsoft character encodings.

DMS uses GLR parsing technology, enabling it to handle all practical context-free grammars. Semantic predicates extend this capability to interesting non-context-free grammars (Fortran requires matching of multiple DO loops with shared CONTINUE statements by label; GLR with semantic predicates enables the DMS Fortran parser to produce ASTs for correctly nested loops as it parses).

DMS provides attribute grammar evaluators for computing custom analyses over ASTs, such as metrics, and including special support for symbol table construction. Other program facts can be extracted by built-in control- and data- flow analysis engines, local and global pointer analysis, whole-program call graph extraction, and symbolic range analysis by abstract interpretation.

Changes to ASTs can be accomplished by both procedural methods coded in PARLANSE and source-to-source tree transformations coded as rewrite rules using surface-syntax conditioned by any extracted program facts, using DMS's Rule Specification Language (RSL). The rewrite rule engine supporting RSL handles associative and commutative rules. A rewrite rule for C to replace a complex condition by the ?: operator be written as:

   rule simplify_conditional_assignment(v:left_hand_side,e1:expression,e2:expression,e3:expression)
   =  " if (\e1) \v=\e2; else \v=e3; " 
   -> " \v=\e1?\e2:\e3; "
   if no_side_effects(v);

Rewrite rules have names, e.g. simplify_conditional_assignment. Each rule has a "match this" and "replace by that" pattern pair separated by ->, in our example, on separate lines for readability. The patterns must correspond to language syntax categories; in this case, both patterns must be of syntax category statement also separated in sympathy with the patterns by ->. Target language (e.g., C) surface syntax is coded inside meta-quotes ", to separate rewrite-rule syntax from that of the target language. Backslashes inside meta-quotes represent domain escapes, to indicate pattern meta variables (e.g., \v, \e1, \e2) that match any language construct corresponding to the metavariable declaration in the signature line, e.g., e1 must be of syntactic category: (any) expression. If a metavariable is mentioned multiple times in the match pattern, it must match to identical subtrees; the same identically shaped v must occur in both assignments in the match pattern in this example. Metavariables in the replace pattern are replaced by the corresponding matches from the left side. A conditional clause if provides an additional condition that must be met for the rule to apply, e.g., that the matched metavariable v, being an arbitrary left-hand side, must not have a side effect (e.g., cannot be of the form of a[i++]; the no_side_effects predicate is defined by an analyzer built with other DMS mechanisms).

Achieving a complex transformation on code is accomplished by providing a number of rules that cooperate to achieve the desired effect. The ruleset is focused on portions of the program by metaprograms coded in PARLANSE.

A complete example of a language definition and source-to-source transformation rules defined and applied is shown using high school algebra and a bit of calculus as a domain-specific language.

DMS has a variety of predefined language front ends, covering most real dialects of C and C++ including C++0x, C#, Java, Python, PHP, EGL, Fortran, COBOL, Visual Basic, Verilog, VHDL and some 20 or more other languages. Predefined languages enable customizers to immediately focus on their reengineering task rather than on the details of the languages to be processed.

DMS is additionally unusual in being implemented in a parallel programming language, PARLANSE, that uses symmetric multiprocessors available on commodity workstations. This enables DMS to provide faster answers for large system analyses and conversions.

DMS was originally motivated by a theory for maintaining designs of software called Design Maintenance Systems.[6]

DMS and "Design Maintenance System" are registered trademarks of Semantic Designs.


External links[edit]