|This article does not cite any references or sources. (January 2012)|
Audio mining is a technique by which the content of an audio signal can be automatically analysed and searched. It is most commonly used in the field of automatic speech recognition, where the analysis tries to identify any speech within the audio. The audio will typically be processed by a speech recognition system in order to identify word or phoneme units that are likely to occur in the spoken content. This information may either be used immediately in pre-defined searches for keywords or phrases (a real-time "word spotting" system), or the output of the speech recogniser may be stored in an index file. One or more audio mining index files can then be loaded at a later date in order to run searches for keywords or phrases.
The results of a search will normally be in terms of hits, which are regions within files that are good matches for the chosen keywords. The user may then be able to listen to the audio corresponding to these hits in order to verify if a correct match was found.
Audio mining systems used in the field of speech recognition are often divided into two groups: those that use Large Vocabulary Continuous Speech Recognisers (LVCSR) and those that use phonetic recognition.
Musical audio mining (also known as music information retrieval) relates to the identification of perceptually important characteristics of a piece of music such as melodic, harmonic or rhythmic structure. Searches can then be carried out to find pieces of music that are similar in terms of their melodic, harmonic and/or rhythmic characteristics.