MBROLA

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

MBROLA (Multi-Band Resynthesis OverLap Add)[1] is an algorithm for speech synthesis, and software which is distributed at no financial cost but in binary form limited to non-commercial use, and a worldwide collaborative project. The MBROLA project web page provides diphone databases for a large number of spoken languages.

The MBROLA software is not a complete text-to-speech system for all those languages; the text must first be transformed into phoneme and prosodic information in MBROLA's format, and separate software to do this is available for some but not all of MBROLA's languages and can require extra setup.

Although diphone-based, the quality of MBROLA's synthesis is considered to be higher than that of most diphone synthesisers as it preprocesses the diphones imposing constant pitch and harmonic phases that enhances their concatenation while only slightly degrading their segmental quality.

MBROLA is a time-domain algorithm, as PSOLA, which implies very low computational load at synthesis time. Unlike PSOLA, however, MBROLA does not require a preliminary marking of pitch periods. This feature has made it possible to develop the MBROLA project around the MBROLA algorithm, through which many speech research labs, companies, or individuals around the world have provided diphone databases for many languages and voices (the number of which is by far a world record for speech synthesis, but there are some notable omissions such as Chinese).

References[edit]

  1. ^ Dutoit, T; Leich, H (Dec 1993). "MBR-PSOLA: Text-To-Speech synthesis based on an MBE re-synthesis of the segments database". Speech Communication. 13 (3-4): 435–440. doi:10.1016/0167-6393(93)90042-J.