Daitch–Mokotoff Soundex: Difference between revisions
KosherJava (talk | contribs) m BM Soundex |
|||
Line 34: | Line 34: | ||
| Jackson-Jackson || J252 || 154664, 454664, 145466, 445466, 154646, 454646, 145464, 445464 |
| Jackson-Jackson || J252 || 154664, 454664, 145466, 445466, 154646, 454646, 145464, 445464 |
||
|} |
|} |
||
==Beider-Morse Phonetic Name Matching Algorithm== |
|||
To address the large number of false positive results generated by the D-M Soundex, [[Stephen Morse]] and [[Alexander Beider]] created the Beider-Morse Phonetic Name Matching algorithm<ref>http://stevemorse.org/phoneticinfo.htm</ref>. This new algorithm cuts down on false positives at the expense of some false negatives. A number of sites are offering the B-M soundex in addition to the D-M soundex<ref>[http://www.avotaynu.com/nu/V09N22.htm Nu? What's New? Volume 9, Number 22]</ref>. |
|||
==See also== |
==See also== |
||
* [[Where Once We Walked]] |
* [[Where Once We Walked]] |
||
==Notes== |
|||
{{reflist}} |
|||
==External links== |
==External links== |
Revision as of 19:30, 11 November 2008
Daitch-Mokotoff Soundex (D-M Soundex) is a phonetic algorithm invented in 1985 by genealogist Gary Mokotoff, and later improved by Randy Daitch, both of the Jewish Genealogical Society. It is a refinement of the Russell and American Soundex algorithms designed to allow matching of Slavic and Yiddish surnames with similar pronunciation but differences in spelling.
Daitch-Mokotoff Soundex is sometimes referred to as "Jewish Soundex" and "Eastern European Soundex", although the authors discourage use of these nicknames for the algorithm.
Improvements
Improvements over the older Soundex algorithms include:
- Coded names are six digits long, resulting in greater search precision (traditional Soundex uses four characters)
- Coded names can be stored as numeric values, which can save space in some applications (regular Soundex encodes values as alphanumeric text)
- Several rules in the algorithm encode multiple character n-grams as single digits (American and Russell Soundex do not handle multi-character n-grams)
- Multiple possible encodings can be returned for a single name (traditional Soundex returns only one encoding, even if the spelling of a name could potentially have multiple pronunciations)
Examples
Some examples:
Surname | American Soundex | D-M Soundex |
Peters | P362 | 739400, 734000 |
Peterson | P362 | 739460, 734600 |
Moskowitz | M232 | 645740 |
Moskovitz | M213 | 645740 |
Auerbach | A612 | 097500, 097400 |
Uhrbach | U612 | 097500, 097400 |
Jackson | J250 | 154600, 454600, 145460, 445460 |
Jackson-Jackson | J252 | 154664, 454664, 145466, 445466, 154646, 454646, 145464, 445464 |
Beider-Morse Phonetic Name Matching Algorithm
To address the large number of false positive results generated by the D-M Soundex, Stephen Morse and Alexander Beider created the Beider-Morse Phonetic Name Matching algorithm[1]. This new algorithm cuts down on false positives at the expense of some false negatives. A number of sites are offering the B-M soundex in addition to the D-M soundex[2].
See also
Notes
External links
- Mokotoff, Gary. "Soundexing and Genealogy." Describes the history and the motivations behind D-M Soundex.
- JewishGen. "Soundex Coding." Describes both Russel and D-M Soundex.
- Coles, Michael. "SQL 2000 DBA Toolkit, Part 3: Phonetic Matching" SQL Server-based implementation of the D-M Soundex algorithm w/source.