Semantic matching has been proposed as a valid solution to the semantic heterogeneity problem, namely, supporting diversity in knowledge. Given any two graph-like structures, e.g. classifications, databases, or XML schemas and ontologies, matching is an operator which identifies those nodes in the two structures that semantically correspond to one another. For example, applied to file systems, it can identify that a folder labeled “car” is semantically equivalent to another folder “automobile” because they are synonyms in English.
The proposed technique works on lightweight ontologies, namely, tree structures where each node is labeled by a natural language sentence, for example in English. These sentences are translated into a formal logical formula (according to an unambiguous, artificial language). The formula codifies the meaning of the node, accounting for its position in the graph. For example, in case the folder “car” is under another folder “red” we can say that the meaning of the folder “car” is “red car” in this case. This is translated into the logical formula “red AND car”.
The output of matching is a mapping, namely a set of semantic correspondences between the two graphs. Each mapping element is attached with a semantic relation, for example equivalence. Among all possible mappings, the minimal mapping is such that all other mapping elements can be computed from the minimal set in an amount of time proportional to the size of the input graphs (linear time) and none of the elements in the minimal set can be dropped without preventing such a computation.
The main advantage of minimal mappings is that they minimize the number of nodes for subsequent processing. Notice that this is a rather important feature because the number of possible mappings can reach n*m with n and m the size of the two input ontologies. In particular, minimal mappings become crucial with large ontologies, e.g. DMOZ, where even relatively small (non-minimal) subsets of the number of possible mapping elements, potentially millions of them, are unmanageable.
Minimal mappings provide usability advantages. Many systems and corresponding interfaces, mostly graphical, have been provided for the management of mappings but all of them scale poorly with the number of nodes. Visualizations of large graphs are rather messy. Maintenance of smaller mappings is much easier, faster and less error prone.