Machine listening

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Machine listening is a technique using software and hardware to extract meaningful information from audio signals. The engineer Paris Smaragdis, interviewed in Technology Review, talks about these systems --"software that uses sound to locate people moving through rooms, monitor machinery for impending breakdowns, or activate traffic cameras to record accidents."[1]

Since audio signals are interpreted by the human ear-brain system, that complex perceptual mechanism should be simulated somehow in software for "machine listening". In other words, to perform on par with humans, the computer should hear and understand audio content much as humans do. Analyzing audio accurately involves several fields: electrical engineering (spectrum analysis, filtering, and audio transforms); artificial intelligence (machine learning and sound classification);[2] psychoacoustics (sound perception); cognitive sciences (neuroscience and artificial intelligence);[3] acoustics (physics of sound production); and music (harmony, rhythm, and timbre). Furthermore, audio transformations such as pitch shifting, time stretching, and sound object filtering, should be perceptually and musically meaningful. For best results, these transformations require perceptual understanding of spectral models, high-level feature extraction, and sound analysis/synthesis. Finally, structuring and coding the content of an audio file (sound and metadata) stand to benefit from efficient compression schemes, which discard inaudible information in the sound.[4] Computational models of music and sound perception and cognition can lead to a more meaningful representation, a more intuitive digital manipulation and generation of sound and music in musical human-machine interfaces.

Machine listening is a recent research field and many research groups are currently working on this area, including the Medical intelligence and language engineering lab at the Department of Electrical Engineering, Indian Institute of Science, Bangalore, India and the Audio Analysis Lab at Aalborg University, Denmark.


  1. ^ Paris Smaragdis taught computers how to play more life-like music
  2. ^ Kelly, Daniel; Caulfield, Brian (Feb 2015). "Pervasive Sound Sensing: A Weakly Supervised Training Approach". IEEE Transaction on Cybernetics. doi:10.1109/TCYB.2015.2396291. Retrieved 1 July 2015. 
  3. ^ Hendrik Purwins, Perfecto Herrera, Maarten Grachten, Amaury Hazan, Ricard Marxer, and Xavier Serra. Computational models of music perception and cognition I: The perceptual and cognitive processing chain. Physics of Life Reviews, vol. 5, no. 3, pp. 151-168, 2008. [1]
  4. ^ Machine Listening Course Webpage at MIT