Jump to content

Loquendo

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Marco Ciaramella (talk | contribs) at 16:39, 24 November 2016 (Bibliography: Pirani + ISBN). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Loquendo
Company typePrivate
IndustryProductivity applications
FoundedSeventies as research group within CSELT
2001 (as independed company)
HeadquartersTorino, Italy
Key people
Chairman & CEO: Davide Franco
Productsspeech synthesis, speech recognition, speaker verification, consulting
Revenue15 million EUR (2010)
1.5 million EUR (2010)
Number of employees
103 (2011)
Websitewww.loquendo.com

Loquendo is a multinational computer software technology corporation, headquartered in Torino, Italy, that provides speech recognition, speech synthesis, speaker verification and identification applications.[1] Loquendo, which was founded in 2001 under the Telecom Italia Lab (formerly, CSELT), also has offices in United Kingdom, Spain, Germany, France, and the United States.[2]

Current business products to can be found in portable and in-car navigation devices, assistive devices for the differently able, smartphones, ebook readers, talking ATMs, computer games, voice-controlled domestic appliances and others. The voice synthesis and speech recognition systems is used in a new e-health application as part of Spain’s Junta de Andalucía Government Health Services's virtual assistant.[3]

Loquendo's products have been the recipient of several awards including being a Speech Technologies Speech Engine Leader in 2007, 2008, and 2009[4] It was rated as 'Market Leader' by Speech Technologies in 2009 and 2010.[5]

On 30 September 2011, Nuance (one of Loquendo's main competitors) announced that it had acquired Loquendo.[6]

History

Loquendo was originally a research group created in the mid-seventies by managers at IRI-STET in the CSELT laboratories in Turin before becoming a company in its own right in 2001.

Speech synthesis

45 rpm record with "Frère Jacques" sung by MUSA in 1978

Building on the recommendations of the University of Padua, by applying the technique of so-called diphones (the union of a consonant and a vowel, 150 in total for the Italian) the group created the first speech synthesizer with high intelligibility in 1975 [7] it was called MUSA (MUltichannel Speaking Automaton), which demonstrated what was possible with the technology of the time. The results achieved in those years were condensed into an audio disc at 45 rpm, with thousands of copies produced and spread through the mass communication media. It was mainly the Italian version of the song Frère Jacques carried out in polyphony with more singing voices (MUSA could manage up to 8 synthesis channels in parallel).

The evolution of this prototype, with the increase in the number of diphones (about 1000), the refinement of the tools of linguistic analysis and better waveform management led to a marked improvement of the synthetic voice. This led to the creation of the integrated circuit "voice synthesizer" developed internally in CSELT which was added to the SGS (catalog as Zilog's Z80 microprocessor's peripheral (with the code M8950).

In the nineties "ELOQUENS" was born, a multi-platformspeech synthesizer for various operating systems including DOS, Windows, System 7, Unix, OS/2) and telephone boards with very large numbers of channels, such as those used by the Italian telephone operator to build the reverse telephoner subscribers information service (used to obtain a subscriber's identity and address from their telephone number).[8]

Towards the end of 1990s speech synthesis took on a new approach, instead of passing diphones it would use the selection and concatenation of acoustic units of variable length, an approach made possible by the increased power of computers and especially the increasing capacity of mass storage systems. This resulted in "ACTOR" - "The human sounding voice" - which began to have a large audience due to the number of telephone services and applications created by Loquendo related companies.

In the two-thousands the synthesizer was released from the research labs as a commercial product, including a number of editing tools to produce synthetic audio enriched with emotions and it was also released as a SW library for use in various products, from small portable devices such as mobile phones, navigators and palm computers, to multichannel/multilingual telephone servers for (semi)automatic call centers.

The Loquendo speech synthesis, has become an Internet Meme in YouTube, though it is more common in videos of the Spanish language, used in creepypastas, and parody dubbings (often with vulgar language).

Speech recognition

Shortly after the start of the research into speech synthesis, they began research on speech recognition and at the beginning of the eighties produced a first prototype, able to recognize the ten digits and a few simple commands.

Applying the Hidden Markov models in 1984[9] led to the development of a speech recognizer which could recognize connected words and sentences, created in collaboration with ELSAG, another company in the IRI-STET group.

The need to produce independent speech recognizer telephone applications leds to the creation of speech databases with the recorded voices of hundreds of different people and in 1987 the first large database, obtained through recording the voices of more than 1000 people calling from all over Italy with an automatic procedure, was used in the creation of a specially crafted phone server at CSELT labs.

This saved material saved allowed the training of Markov models, and, by using sophisticated algorithms led to the development of "AURIS", the first commercial recognizer that could "turn" in a variety of devices with Digital signal processors (DSP).

In the nineties a large cross-European collaboration began and, along with a dozen other companies and universities across Europe a very large speech database was collected throughout Europe, with the voices of more than 65000 people.[10]

This material, combined with a new mixed approach of Hidden Markov models and Neural networks led to "FLEXUS" the first flexible vocabulary speech recognizer, which allowed many varied telephone services to use automatic speech recognition in their human interfaces.

Merging "FLEXUS" and "ACTOR" into a single system created "Dialogos", allowing the creation of cutting-edge telephone services.

The birth of Loquendo as a company led to the development of many languages and the release of the recognizer in the form of library software for the creation of various telephony applications.

They also introduced several systems to write state-finite grammars and natural language models systems.

The speech databases recording campaigns continue having moved on from Europe to Mediterranean countries, to the South, Center and North America and, finally to countries in the Far East. Overall countless hours of speech have been recorded by contacting hundred of thousands of people in the listed regions. The recordings have been collected both for fixed telephone networks, as well as in moving vehicles for mobile phones and also using high quality microphones in domestic environments for consumer applications such as video games, appliances and home automation in general.

Speaker recognition

Research activities into speaker recognition were initiated very recently, in the middle of two-thousands, when speech databases tailored for this task became available. In collaboration with Politecnico of Turin they began experiments on two different fronts: speaker "identification" and "verification".

The success of the research has also pushed the company to move to the development of products specifically for these tasks through the enabling platforms described below.

Speech coding

The research activities into Speech coding started even before the ones on speech recognition and synthesis, aiming to build equipment such as CODEC and echo canceler to be able to increase as much as possible the number of telephone conversations that can flow through a single cable (or satellite connection) without losing voice intelligibility.

In the late seventies, studies and experiments led to the creation of algorithms to encode the telephonic speech signal and set-up the European regulation CCITT known as encoding A-law (8-bit logarithm encoding law "A" for audio signal 8 kHz band limited). This standard was then used in the CODEC for 64 kbit/s ISDN telephone lines.

In subsequent years they built stronger codecs (used telephone exchangess) and, within the PAN-Europe consortium GSM, the codec to use in second generation mobile phones.

At the same time they built a CODEC to transmit high quality signals in spite of the 8 kHz band limit of the telephone cables, which was useful for audio and video conference applications.

Enabling platforms

In the late nineties the development of Internet in the form known today (hypertext resident on different servers that span the planet in one big network) led to the need to make these texts available in voice over the phone.

At the same time IVR - systems become always more and more widespread and it became essentials HW and SW tools to fast development of new telephone applications and services. It is evident to everybody that that brought the achievement of complex systems such us the automation of the 'Telephone directory or the Railway Information Service are too rigid and do not allow the easy development of new applications.

At the same time, the IVR - Interactive Voice Response, became increasingly popular and used hardware and software tools to quickly develop new telephony applications. It became evident that the previous development models that led to the development of complex systems such as automation of directory inquiry service or Automatic Information Service Stations were too rigid and would not easily allow the development of new applications.

It was therefore felt that there was a need for enabling platforms for automatic voice telephone systems that are both scalable and easily programmable. To this end there was created a special working group to develop a voice browser prototype, to be shown to the public at SMAU 2000,[11] with the name "VoxNauta". It was such a success that Telecom Italia decided to close its original research labs and create Loquendo on 1 February 2001.

Over the years "VoxNauta" was further developed in various scalable forms: from small servers to large enterprise systems with thousands of lines and has been installed in hundreds of companies around the world.

The birth of standards to write telephone services to connect server hosting the speech technologies to servers hosting the telephone boards pushes the development of solo SW, .

The emergence of standards in the writing of telephone services (VoiceXML) and protocols (MRCP) for connecting servers hosting the speech technologies to servers hosting the telephone boards led to the creation of Speech Server software, hosting text-to-speech and speech-recognizer engines from Loquendo

This continuing research and development has led Loquendo to be one of the most widely known brands in the field of synthesis and voice recognition.

The brand

There is no definitive explanation of the origin of the name Loquendo, while the logo was created by the Telecom Italia graphic department. When displayed as an animated gif the three ripples above the "O" turn on in sequence, giving the sense of the emission of sound.

The brand has not been protected by the company, there are other Italian companies whose name directly derives from Loquendo, and this has contributed to its widespread use, even at the expense of competing brands.

Sale of the company

Over the years there have been rumors of the sale of Loquendo to other companies.[12]

The most recent were in the summer of 2011, when it was announced that two multinational USA based companies, Nuance and Avaya, were looking into the possibility of a takeover.

As Nuance was a direct competitor of the Italian company there was some worry by Loquendo workers that were worried about the possible dismemberment of research and development and the disappearance from Italy of an excellent brand with forty years experience.[13]

A purchase by Avaya seemed more desirable as its activities were complementary to the activity carried on by Loquendo; Avaya in fact did not own any speech technology and therefore could have been very interested in the possibility of in-house development rather than purchasing them from outside companies .[14]

These reports were followed with great interest by the workers, local authorities in Turin and Piedmont and the entire international scientific community.[15][16][17]

On 13 August 2011, Telecom Italia publicly announced the sale of its entire stake in Loquendo to Nuance for 53 million euros[18][19][20]

Products

References

  1. ^ http://www.loquendo.com/en/about/loquendo-at-a-glance/
  2. ^ http://www.loquendo.com/en/about/locations/
  3. ^ http://www.speechtechmag.com/Articles/Editorial/Europe-News/Loquendo-Lends-Its-Voice-to-Virtual-Assistant-for-Government-Health-Services---67237.aspx
  4. ^ http://www.speechtechmag.com/Articles/Editorial/Cover-Story/Market-Leaders-Speech-Engine-67965.aspx
  5. ^ http://www.speechtechmag.com/BuyersGuide/Loquendo-964.aspx
  6. ^ Nuance Closes Acquisition of Loquendo, press release, Nuance, 30 September 2011
  7. ^ Roberto Billi (editor), with the following Authors from CSELT: Agostino Appendino, Giancario Babini, Paolo Baggia, Roberto Billi, Alfredo Biocca, Pier Giorgio Bosco, Franco Canavesio, Giuseppe Castagneri, Alberto Ciaramella, Morena Danieli, Fulvio Faraci, Luciano Fissore, Roberto Gemello, Elisabetta Gerbino, Egidio Giachin, Giorgio Micca, Roberto Montagna, Luciano Nebbia, Silvia Quazza, Daniele Roffinella, Luciano Rosboch, Claudio Rullent, Pier Luigi Salza, Stefano Sandri, "Tecnologie vocali per l'interazione uomo-macchina. Nuovi servizi a portata di voce", Ed. Telecom Lab 1995. ISBN
  8. ^ Roberto Billi, Franco Canavesio, Alberto Ciaramella, Luciano Nebbia, "Interactive voice technology at work: The CSELT experience", Ed. Speech communication, 1995 - Elsevier
  9. ^ Pirani, Giancarlo, ed. Advanced algorithms and architectures for speech understanding. Vol. 1. Springer Science & Business Media, 2013.
  10. ^ SpeechDat family projects (from the progenitor's name)
  11. ^ (it) Corriere della Sera, Pagine web da ascoltare al telefono, 4 settembre 2000
  12. ^ (it) il Giornale, Telecom, in attesa di Sparkle vende la «piccola» Loquendo, 11 luglio 2009
  13. ^ (it) la Repubblica, Loquendo, il ministero convoca anche Bernabè, 2 agosto 2010
  14. ^ (it) la Repubblica, Loquendo, seconda offerta. I dipendenti: "Dà più garanzie", 6 agosto 2010
  15. ^ "Salviamo Loquendo!". Retrieved 10 August 2011.
  16. ^ "Un neo da estirpare", l'Informatica, cap. 1 In: Luciano Gallino, "La scomparsa dell'Italia industriale", Ed. Einaudi 2003 - ISBN 978-88-06-16628-1
  17. ^ Marina Cassi, La comunità della scienza difende Loquendo, "La Stampa", 10 agosto 2011
  18. ^ press release, Telecom Italia sells Loquendo to Nuance for an Enterprise Value of €53 Million, "Telecom Italia", 13 August 2011
  19. ^ (press release, Nuance to Acquire Loquendo, "Nuance", 15 August 2011
  20. ^ (it) Luca Davi, Telecom Italia cede Loquendo al gruppo Nuance, "Il Sole 24 ORE", 14 agosto 2011

Bibliography

  • (it) Luigi Bonavoglia, "CSELT trent'anni", Ed. CSELT, 1994 [1]
  • (it) Roberto Billi (curator), with the following Authors of CSELT: Agostino Appendino, Giancario Babini, Paolo Baggia, Roberto Billi, Alfredo Biocca, Pier Giorgio Bosco, Franco Canavesio, Giuseppe Castagneri, Alberto Ciaramella, Morena Danieli, Fulvio Faraci, Luciano Fissore, Roberto Gemello, Elisabetta Gerbino, Egidio Giachin, Giorgio Micca, Roberto Montagna, Luciano Nebbia, Silvia Quazza, Daniele Roffinella, Luciano Rosboch, Claudio Rullent, Pier Luigi Salza, Stefano Sandri, "Tecnologie vocali per l'interazione uomo-macchina. Nuovi servizi a portata di voce", Ed. Telecom Lab 1995, ISBN 88-85404-09-X, ISBN 978-88-85404-09-0
  • (en) Pirani, Giancarlo, ed. Advanced algorithms and architectures for speech understanding. Vol. 1. Springer Science & Business Media, 2013. ISBN 978-3-540-53402-0
  • (it) Quarant'anni d'innovazione, ed. Millennium s.r.l, (supplemento al num 224 di Media Duemila, 2005)
  • (it) torinowireless.it
  • (it) smau.it
  • (it) corriere.it
  • (it) isticom.it
  • (it) deputatids.it
  • (it) h-care.eu
  • (it) Forum P.A. 17-20 maggio 2010 - Cartella Stampa AVAYA