Multidimensional hierarchical toolkit

From Wikipedia, the free encyclopedia
Jump to: navigation, search

The Multidimensional hierarchical toolkit or Multi-Dimensional and Hierarchical (MDH) Database Toolkit is a Linux-based, open-sourced, toolkit of portable software that supports very fast, flexible, multi-dimensional and hierarchical storage, retrieval and manipulation of information in databases ranging in size up to 256 terabytes. The package is written in C and C++ and is available under the GNU GPL/LGPL/Free Documentation licenses in source code form. The distribution kit contains demonstration implementations of network-capable, interactive text and sequence retrieval tools that function with very large genomic data bases and illustrate the toolkit's capability to manipulate massive data sets of genomic information.

Distribution[edit]

The toolkit is distributed as part of the Mumps Compiler. Versions exist for Linux, Cygwin, and Windows XP.

Origins[edit]

The toolkit is a solution to the problem of manipulating very large, character string indexed, multi-dimensional, sparse matrices. It is based on MUMPS (also referred to as M), a general purpose programming language that originated in the mid 60's at the Massachusetts General Hospital.

Key features[edit]

The principal database feature in this project is the global array which permits direct, efficient manipulation of multi-dimensional arrays of effectively unlimited size. A global array is a persistent, sparse, undeclared, multi-dimensional, string indexed data disk based structure. A global array may appear anywhere an ordinary array reference is permitted and data may be stored at leaf nodes as well as intermediate nodes in the data base array. The number of subscripts in an array reference is limited only by the total length of the array reference with all subscripts expanded to their string values. The toolkit includes several functions to traverse the data base and manipulate the arrays.

The toolkit makes the data base and function set available as C++ classes and also permits interpretive execution of legacy Mumps scripts. To use the toolkit, you install the MDH and Mumps distribution kit and related code.

Functions implemented[edit]

The toolkit implements the legacy Mumps functions: $ascii(), $extract(), $find(), $horolog, $length(), $name(), $justify(), $order(), $piece(), and $test as well as vector and matrix operations, Boyer–Moore–Gosper string search algorithm functions, a Smith–Waterman algorithm function, relational algebra operations and access to the Perl Compatible Regular Expression library (PCRE).