3D sound localization: Difference between revisions

Content deleted Content added

Inline

Revision as of 23:24, 10 November 2014

'3D sound localization' refers to the acoustical engineering technology that been used to identify the location of a sound source in a three-dimensional space. The interest of sound localization is widely increasing since the necessity of reasonable solutions in some fields like audio and acoustic, such as hearing aid and navigation. Usually the location of the source is determined by the direction of the coming sound waves(horizontal and vertical angles) and the distance between the source and the sensors. We note that the sound source localization problem is also a source localization problem. It involves the structure arrangement design of the sensors and signal processing techniques.

Applications

There are many applications of sound source localization such as sound source separation, sound source tracking and speech enhancement. Underwater sonar uses sound source localization techniques to identify the location of a target. It is also used in robots for effective human-robot interaction.

Cues for sound localization

Localization cues^[1] are the features the can help us localize sound. Cues for the sound localization include the binaural cues and monoaural cues.

Monoaural cues can be obtained by means of spectral analysis. Monoaural cues are generally used in vertical localization
Binaural cues are generated by the difference of hearing between the left and right ears. These includes interaural time difference(ITD) and interaural intensity difference(ILD). Binaural cues are mostly used for the horizontal localization.

Methods

There are many 3D sound localization methods that are used for various applications.

Different types of sensor structure can be used such as microphone array and binaural hearing robot head.^[2]
Different techniques can be used to get the optimal results, such as neural network, maximum likelihood and Multiple signal classification (MUSIC).
According to the timeliness, there are real-time methods and off-line methods

• Microphone Array Approach

Collocated Microphone Array^[3]

In real-time sound localization, using a collocated array named Acoustic Vector Sensor (AVS) array. The procedure is input sound signal first through a rectangular window at first, then each segment signal is caused a frame. 4 parallel frames are detected from XYZO array and used for DOA estimation, the 4 frames are split into small blocks with equal size, then the hamming window and FFT are used to convert each block from time domain to frequency domain.The advantages of this array compared with past microphone array is that this device has a high performance even if the aperture is small, and it can localize multiple low frequency and high frequency wide band sound sources simultaneously. Applying an O array that can make more available acoustic information such as amplitude and time difference. Most importantly, XYZO array has a better performance with a tiny size.

Acoustic Vector Array

· Using 4 sensors collocated microphone array to approximated AVS (acoustic vector sensor) array. · This approximated array has been widely used under water. · Containing: 3 orthogonally installed acoustic particle velocity gradient microphones which are X, Y and Z array, showed on the figure, and one omnidirectional acoustic microphone O. · Use the Offline Calibration Process, measure and interpolate the impulse response of X, Y, Z and O array, to obtain their steering vector.

Multiple Microphone Array

Estimating the sound directions by multiple arrays and then find the locations where the direction detected by different arrays cross.

*Motivation of Advanced Microphone array

Sound reflections always occur in actual environment, microphone array^[4] cannot avoid observing those reflections.

*Learning how to apply Multiple Microphone Array

The angle uncertainty will occur when estimating direction, and position uncertainly(PU) will also aggravate with the increasing the distance between the array and the source. As we all know that:

PU\left(r\right)={\frac {\pm AU}{360\times 2\pi \times r}}

r is the range of source, and AU is angle uncertainly. This measurement is used for judging whether two direction across at some location or not. Minimum distance between two lines:

dist\left(dir_{1},dir_{2}\right)={\frac {\left({\overrightarrow {v_{1}}}\times {\overrightarrow {v_{2}}}\right)\times {\overrightarrow {p_{1}p_{2}}}}{\left|{\overrightarrow {v_{1}}}\times {\overrightarrow {v_{2}}}\right|}}

$dir_{1}$ and $dir_{2}$ are two direction, $v_{i}$ are vectors parallel to detected direction, $p_{i}$ are the position of arrays. When two lines are crossing, we can compute the sound source location using the following way:

POS_{source}={\frac {\left(POS_{1}\times w_{1}+POS_{2}\times w_{2}\right)}{w_{1}+w_{2}}}

If

dist(dir_{1},dir_{2})<abs(PU_{1}(r_{1}))+abs(PU_{2}(r_{2}))

two lines are judged as crossing. $POS_{source}$ is the estimation of sound source position, $POS_{n}$ is the positions where each direction intersect the line with minimum distance, $w_{n}$ is the weighted factors.

• Binaural Hearing Approach

Learning method for binaural hearing

Binaural hearing Learning^[2] is a bionic method. The sensor is a robot dummy head with 2 sensor microphones along with the artificial pinna(reflector). The robot head has 2 rotation axes and can rotate horizontally and vertically. The reflector causes the spectrum change into a certain pattern for incoming white noise sound wave and this pattern is used for the cue of the vertical localization. Cue used for horizontal localization is ITD.

The system should make use of a learning process using neural networks by rotating the head with a settled white noise sound source and analyzing the spectrum. Experiments show that the system can identify the direction of the source well in a certain range of angle of arrival. But it cannot identify the sound coming outside the range due to the collapsed spectrum pattern of the reflector.

Binaural hearing use only 2 microphones and is capable of concentrating on one source amaong noises and different sources.

Head-related Transfer Function (HRTF)

In the real sound localization, the whole head and the torso have an important functional role, not only the two pinnae. This function can be described as spatial linear filtering and the filtering is always quantified in terms of Head-Related Transfer Function (HRTF).^[5]

HRTF also uses the robot head sensor, which is the binaural hearing model. This model has multiple inputs. The HRTF can be derived based on various cues for localization. Sound localization with HRTF is flitering the input signal with a filter which is designed based on the HRTF. Instead of using the neural networks, a head-related transfer function is used and the localization is based on a simple correlation approach.

See more: Head-related transfer function.

Cross-power spectrum phase (CSP) analysis

CSP method^[6] is also used for the binaural model. The idea is that the angle of arrival can be derived through the time delay of arrival (TDOA) between two microphones, and TDOA can be estimated by finding the maximum coefficients of CSP. CSP coefficients are derived by:

csp_{ij}(k)=IFFT\left\{{\frac {FFT[s_{i}(n)]\cdot FFT[s_{j}(n)]^{*}}{\left|FFT[s_{i}(n)]\right\vert \cdot \left|FFT[s_{j}(n)]\right\vert \quad }}\right\}\quad

Where $s_{i}(n)$ and $s_{j}(n)$ are signals entering the microphone $i$ and $j$ respectively
Time delay of arrival( $\tau$ ) then can be estimated by:

{\tau }=arg\max\{csp_{ij}(k)\}

Sound source direction is

{\theta }=cos^{-1}{\frac {v\cdot \tau }{d_{max}\cdot F_{s}}}

Where $v$ is the sound propagation speed, $F_{s}$ is the sampling frequency and $d_{max}$ is the distance with maximum time delay between 2 microphones.

CPS method does not require the system impulse response data that HRTF needs. An expectation-maximization algorithm is also used for localizing several sound sources and reduce the localization errors. The system is capable of identifying several moving sound source using only two microphones.

2D sensor line array

In order to estimate the location of a source in 3d space, we can use 2 line sensor arrays by respectively putting them horizontally and vertically. An example is a 2D line array used for underwater source localization.^[7] By processing the data from 2 arrays using the maximum likelihood method, the direction, range and depth of the source can be identified simultaneously.
Unlike the binaural hearing model, this method is much more like a spectral analysis method. The method can be used for localizing a source which is far away, but the system could be much more expensive than the binaural model because it needs more sensors and power.

Biologically inspired binaural sound localization based on Hierarchical Fuzzy Artificial Neural Networks

Combining ITD-based and IID-based sound localization methods for higher accuracy that is similar to human, used this Hierarchical Fuzzy Artificial Neural Networks^[8] to achieve same accuracy as human ears. IID-based or ITD-based sound localization method have a main problem so called Front-back confusion. In this work, based on hierarchical neural network will conquer those confusion with an IID estimation combined with ITD estimation.But it only test on broadband sounds, and just combined few cue, present neural network cannot be used for non-stationary scenario.

It’s still not understood what animals with two ears and pea-sized brain such as some primitive mammals are able to perceive 3D space and process sand sounds. Some animals difficult in 3d sound location, due to the small heads and the wavelength of communication sound may be much larger than their head diameter like frogs. Animals use MAMA^[9]minimum audible movement angle) evaluates the localization capability of a given biological or electro-mechanical auditory system.

References

^ Goldstein, E.Bruce. Sensation and Perception(Eighth Edition). Cengage Learning. pp. 293–297. ISBN 978-0-495-60149-4.
^ ^a ^b Nakasima,H. and Mukai,T. (Oct 2005). 3D Sound Source Localization System Based on Learning of Binaural Hearing. Systems,Man and Cybernetics,IEEE 2005. Vol. 4. pp. 3534–3539. doi:10.1109/ICSMC.2005.1571695.{{cite conference}}: CS1 maint: multiple names: authors list (link)
^ Liang, Yun; Cui, Zheng; Zhao, Shengkui; Rupnow, Kyle; Zhang, Yihao; Jones, Douglas L.; Chen, Deming (2012). "Real-time implementation and performance optimization of 3D sound localization on GPUs". Automation and Test in Europe Conference and Exhibition: 832-5. ISSN 1530-1591.
^ Ishi, C.T.; Even, J.; Hagita, N. (November 2013). "Using multiple microphone arrays and reflections for 3D localization of sound sources". 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2013): 3937-42. doi:10.1109/IROS.2013.6696919.
^ Keyouz,F. and Diepold,K. (Aug 2006). An Enhanced Binaural 3D Sound Localization Algorithm. Signal Processing and Information Technology,IEEE 2006. pp. 662–665. doi:10.1109/ISSPIT.2006.270883.{{cite conference}}: CS1 maint: multiple names: authors list (link)
^ Hyun-Don Kim;Komatani, K.;Ogata, T.;Okuno,H.G. (Jan 2008). Evaluation of Two-Channel-Based Sound Source Localization using 3D Moving Sound Creation Tool. ICERI 2008. doi:10.1109/ICKS.2008.25.{{cite conference}}: CS1 maint: multiple names: authors list (link)
^ Tabrikian,J. and Messer,H. (Jan 1996). "Three-Dimensional Source Localization in a Waveguide". IEEE Transaction on Signal Processing. 44 (1): 1–13. doi:10.1109/78.482007.{{cite journal}}: CS1 maint: multiple names: authors list (link)
^ Keyrouz, Fakheredine; Diepold, Klaus (May 2008). "A novel biologically inspired neural network solution for robotic 3D sound source sensing". Soft Computing. 12 (7). Germany: 721-9. doi:10.1007/s00500-007-0249-9. ISSN 1432-7643.
^ Chandler, Daniel W.; Grantham, D. Wesley (March 1992). "Minimum audible movement angle in the horizontal plane as a function of stimulus frequency and bandwidth, source azimuth, and velocity". The journal of the acoustical society of America. 91 (3): 1624-36.

External links

[1] Goldstein, E.Bruce. Sensation and Perception(Eighth Edition). Cengage Learning. pp. 293–297. ISBN 978-0-495-60149-4.

[binaural-2] Nakasima,H. and Mukai,T. (Oct 2005). 3D Sound Source Localization System Based on Learning of Binaural Hearing. Systems,Man and Cybernetics,IEEE 2005. Vol. 4. pp. 3534–3539. doi:10.1109/ICSMC.2005.1571695.{{cite conference}}: CS1 maint: multiple names: authors list (link)

[3] Liang, Yun; Cui, Zheng; Zhao, Shengkui; Rupnow, Kyle; Zhang, Yihao; Jones, Douglas L.; Chen, Deming (2012). "Real-time implementation and performance optimization of 3D sound localization on GPUs". Automation and Test in Europe Conference and Exhibition: 832-5. ISSN 1530-1591.

[4] Ishi, C.T.; Even, J.; Hagita, N. (November 2013). "Using multiple microphone arrays and reflections for 3D localization of sound sources". 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2013): 3937-42. doi:10.1109/IROS.2013.6696919.

[5] Keyouz,F. and Diepold,K. (Aug 2006). An Enhanced Binaural 3D Sound Localization Algorithm. Signal Processing and Information Technology,IEEE 2006. pp. 662–665. doi:10.1109/ISSPIT.2006.270883.{{cite conference}}: CS1 maint: multiple names: authors list (link)

[6] Hyun-Don Kim;Komatani, K.;Ogata, T.;Okuno,H.G. (Jan 2008). Evaluation of Two-Channel-Based Sound Source Localization using 3D Moving Sound Creation Tool. ICERI 2008. doi:10.1109/ICKS.2008.25.{{cite conference}}: CS1 maint: multiple names: authors list (link)

[7] Tabrikian,J. and Messer,H. (Jan 1996). "Three-Dimensional Source Localization in a Waveguide". IEEE Transaction on Signal Processing. 44 (1): 1–13. doi:10.1109/78.482007.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[8] Keyrouz, Fakheredine; Diepold, Klaus (May 2008). "A novel biologically inspired neural network solution for robotic 3D sound source sensing". Soft Computing. 12 (7). Germany: 721-9. doi:10.1007/s00500-007-0249-9. ISSN 1432-7643.

[9] Chandler, Daniel W.; Grantham, D. Wesley (March 1992). "Minimum audible movement angle in the horizontal plane as a function of stimulus frequency and bandwidth, source azimuth, and velocity". The journal of the acoustical society of America. 91 (3): 1624-36.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

@@ Line 1: / Line 1: @@
-{{Orphan|date=November 2013}}
+{{Orphan|date=November 2014}}
-'3D sound localization' refers to the [[acoustical engineering]] technology that been used to identify the location of a sound source in a [[three-dimensional space]]. Usually the location of the source is determined by the direction of the coming sound waves(horizontal and vertical angles) and the distance between the source and the sensors. We note that the sound source localization problem is also a source localization problem. It involves the structure arrangement design of the [[sensors]] and [[signal processing]] techniques.
+'3D sound localization' refers to the [[acoustical engineering]] technology that been used to identify the location of a sound source in a [[three-dimensional space]]. The interest of sound localization is widely increasing since the necessity of reasonable solutions in some fields like audio and acoustic, such as [[hearing aid]] and [[navigation]].
+Usually the location of the source is determined by the direction of the coming sound waves(horizontal and vertical angles) and the distance between the source and the sensors. We note that the sound source localization problem is also a source localization problem. It involves the structure arrangement design of the [[sensors]] and [[signal processing]] techniques.
 ==Applications==
@@ Line 17: / Line 18: @@
 *Different techniques can be used to get the optimal results, such as [[neural network]], [[maximum likelihood]] and [[Multiple signal classification]] (MUSIC).
 *According to the timeliness, there are real-time methods and off-line methods
+<br />
+•     '''<big>Microphone Array Approach</big>'''
+===Collocated Microphone Array<ref>{{cite journal|last1=Liang|first1=Yun|last2=Cui|first2=Zheng|last3=Zhao|first3=Shengkui|last4=Rupnow|first4=Kyle|last5=Zhang|first5=Yihao|last6=Jones|first6=Douglas L.|last7=Chen|first7=Deming|title=Real-time implementation and performance optimization of 3D sound localization on GPUs|journal=Automation and Test in Europe Conference and Exhibition|date=2012|page=832-5|issn=15301591}}</ref>===
+In real-time sound localization, using a collocated array named Acoustic Vector Sensor (AVS) array. The procedure is input sound signal first through a rectangular window at first, then each segment signal is caused a frame. 4 parallel frames are detected from XYZO array and used for DOA estimation, the 4 frames are split into small blocks with equal size, then the hamming window and FFT are used to convert each block from time domain to frequency domain.The advantages of this array  compared with past microphone array is that  this device has a high performance even if the aperture is small, and it can localize multiple low frequency and high frequency wide band sound sources simultaneously. Applying an O array that can make more available acoustic information such as amplitude and time difference. Most importantly, XYZO array has a better performance with a tiny size.
+====Acoustic Vector Array====
+[[File:XYZO array.png|thumb|right|frame|150px|AVS array: XYZO array]]
+·    Using 4 sensors collocated microphone array to approximated AVS (acoustic vector sensor) array.
+·    This approximated array has been widely used under water.
+·    Containing: 3 orthogonally installed acoustic particle velocity gradient microphones which are X, Y and Z array, showed on the figure, and one omnidirectional acoustic microphone O.
+·    Use the Offline Calibration Process, measure and interpolate the impulse response of X, Y, Z and O array, to obtain their steering vector.
+<br style="clear:both;">
-===Learning method for binaural hearing ===
+==='''Multiple Microphone Array'''===
+Estimating the sound directions by multiple arrays and then find the locations where the direction detected by different arrays cross.
+====*Motivation of Advanced Microphone array====
+Sound reflections always occur in actual environment, microphone array<ref>{{cite journal|last1=Ishi|first1=C.T.|last2=Even|first2=J.|last3=Hagita|first3=N.|title=Using multiple microphone arrays and reflections for 3D localization of sound sources|journal=2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2013)|date=November 2013|page=3937-42|doi=10.1109/IROS.2013.6696919}}</ref> cannot avoid observing those reflections.
+====*Learning how to apply Multiple Microphone Array====
+The angle uncertainty will occur when estimating direction, and position uncertainly(PU) will also aggravate with the increasing the distance between the array and the source.
+As we all know that:
+:<math>PU \left(r \right)= \frac{\pm AU}{360 \times 2 \pi \times r} </math>
+r is the range of source, and AU is angle uncertainly. This measurement is used for judging whether two direction across at some location or not.
+Minimum distance between two lines:
+:<math>dist \left(dir_1,dir_2 \right)=\frac{ \left( \overrightarrow{v_1} \times \overrightarrow{v_2} \right) \times \overrightarrow{p_1 p_2}}{ \left| \overrightarrow{v_1} \times \overrightarrow{v_2} \right|}</math>
+<math>dir_1</math>and <math>dir_2</math> are two direction, <math>v_i </math>are vectors parallel to detected direction, <math>p_i </math>are the position of arrays.
+When two lines are crossing, we can compute the sound source location using the following way:
+:<math>POS_{source} = \frac {\left( POS_1 \times w_1 + POS_2 \times w_2 \right)}{w_1 + w_2 } </math>
+If
+:<math>dist(dir_1,dir_2)<abs(PU_1(r_1))+abs(PU_2(r_2))</math>
+two lines are judged as crossing.
+<math>POS_{source} </math>is the estimation of sound source position, <math>POS_n</math> is the positions where each direction intersect the line with minimum distance, <math>w_n</math> is the weighted factors.
+<br />
+•     '''<big>Binaural Hearing Approach</big>'''
+==='''Learning method for binaural hearing''' ===
 [[File:Binaural robot head2.png|thumb|upright|Structure of the binaural robot dummy head]]
@@ Line 27: / Line 62: @@
 Binaural hearing use only 2 microphones and is capable of concentrating on one source amaong noises and different sources.
-===Head-related Transfer Function (HRTF)===
+==='''Head-related Transfer Function (HRTF)'''===
 In the real sound localization, the whole head and the torso have an important functional role, not only the two pinnae. This function can be described as spatial linear filtering and the filtering is always quantified in terms of Head-Related Transfer Function (HRTF).<ref>{{cite conference|author=Keyouz,F. and Diepold,K.|title=An Enhanced Binaural 3D Sound Localization Algorithm|conference=Signal Processing and Information Technology,IEEE 2006|pages=662–665|date=Aug 2006|doi=10.1109/ISSPIT.2006.270883}}</ref>
@@ Line 34: / Line 69: @@
 See more: [[Head-related transfer function]].
-===Cross-power spectrum phase (CSP) analysis===
+==='''Cross-power spectrum phase (CSP) analysis'''===
 CSP method<ref>{{cite conference|author=Hyun-Don Kim;Komatani, K.;Ogata, T.;Okuno,H.G.|title=Evaluation of Two-Channel-Based Sound Source Localization using 3D Moving Sound Creation Tool|conference=ICERI 2008|date=Jan 2008|doi=10.1109/ICKS.2008.25}}</ref> is also used for the binaural model. The idea is that the angle of arrival can be derived through the time delay of arrival (TDOA) between two microphones, and TDOA can be estimated by finding the maximum coefficients of CSP. CSP coefficients are derived by:<br>
 :<math>csp_{ij}(k)=IFFT\left \{ \frac{FFT[s_{i}(n)]\cdot FFT[s_{j}(n)]^*} {\left |FFT[s_{i}(n)]\right \vert \cdot \left |FFT[s_{j}(n)]\right \vert \quad} \right \} \quad
@@ Line 49: / Line 84: @@
 CPS method does not require the system impulse response data that HRTF needs. An [[expectation-maximization algorithm]] is also used for localizing several sound sources and reduce the localization errors. The system is capable of identifying several moving sound source using only two microphones.
-===2D sensor line array===
+==='''2D sensor line array'''===
 [[File:2D array demo3.png|thumb|upright|Demonstration of 2d line sensor array]]
 In order to estimate the location of a source in 3d space, we can use 2 line sensor arrays by respectively putting them horizontally and vertically.
 An example is a 2D line array used for underwater source localization.<ref>{{cite journal|author=Tabrikian,J. and Messer,H.|title=Three-Dimensional Source Localization in a Waveguide|journal=IEEE Transaction on Signal Processing|volume=44|issue=1|date=Jan 1996|doi=10.1109/78.482007|pages=1–13}}</ref> By processing the data from 2 arrays using the [[maximum likelihood]] method, the direction, range and depth of the source can be identified simultaneously. <br>
 Unlike the binaural hearing model, this method is much more like a [[spectral analysis]] method. The method can be used for localizing  a source which is far away, but the system could be much more expensive than the binaural model because it needs more sensors and power.
+==='''Biologically inspired binaural sound localization based on Hierarchical Fuzzy Artificial Neural Networks'''===
+[[File:Whole system.png|thumb|frame|right|Structure of sound localization system]]
+Combining ITD-based and IID-based sound localization methods for higher accuracy that is similar to human, used this Hierarchical Fuzzy Artificial Neural Networks<ref>{{cite journal|last1=Keyrouz|first1=Fakheredine|last2=Diepold|first2=Klaus|title=A novel biologically inspired neural network solution for robotic 3D sound source sensing|journal=Soft Computing|date=May 2008|volume=12|issue=7|page=721-9|doi=10.1007/s00500-007-0249-9|url=http://link.springer.com.ezproxy1.lib.asu.edu/article/10.1007%2Fs00500-007-0249-9|location=Germany|issn=1432-7643}}</ref> to achieve same accuracy as human ears. IID-based or ITD-based sound localization method have a main problem so called Front-back confusion. In this work, based on hierarchical neural network will conquer those confusion with an IID estimation combined with ITD estimation.But it only test on broadband sounds, and just combined few cue, present neural network cannot be used for non-stationary scenario.
+It’s still not understood what animals with two ears and pea-sized brain such as some primitive mammals are able to perceive 3D space and process sand sounds. Some animals difficult in 3d sound location, due to the small heads and the wavelength of communication sound may be much larger than their head diameter like [[frogs]]. Animals use MAMA<ref>{{cite journal|last1=Chandler|first1=Daniel W.|last2=Grantham|first2=D. Wesley|title=Minimum audible movement angle in the horizontal plane as a function of stimulus frequency and bandwidth, source azimuth, and velocity|journal=The journal of the acoustical society of America|date=March 1992|volume=91|issue=3|page=1624-36|url=http://dx.doi.org.ezproxy1.lib.asu.edu/10.1121/1.402443}}</ref>minimum audible movement angle) evaluates the localization capability of a given biological or electro-mechanical auditory system.
 ==See also==