{"title": "Spectral Cues in Human Sound Localization", "book": "Advances in Neural Information Processing Systems", "page_first": 768, "page_last": 774, "abstract": null, "full_text": "Spectral Cues in Human Sound Localization \n\nCraig T. Jin \n\nDepartment of Physiology and \n\nDepartment of Electrical Engineering, \nUniv.  of Sydney, NSW 2006, Australia \n\nAnna Corderoy \n\nDepartment of Physiology \n\nUniv.  of Sydney, NSW 2006, Australia \n\nSimon Carlile \n\nDepartment of Physiology \n\nand Institute of Biomedical Research \nUniv.  of Sydney, NSW 2006, Australia \n\nAndre van Schaik \n\nDepartment of Electrical Engineering, \nUniv.  of Sydney, NSW 2006, Australia \n\nAbstract \n\nThe  differential  contribution  of the  monaural  and  interaural  spectral \ncues to human sound localization was examined using a combined psy(cid:173)\nchophysical  and  analytical  approach.  The  cues  to  a  sound's  location \nwere  correlated  on an individual basis  with  the  human  localization re(cid:173)\nsponses to a variety of spectrally manipulated sounds.  The spectral cues \nderive from the acoustical filtering of an individual's auditory periphery \nwhich  is  characterized by the measured head-related transfer functions \n(HRTFs).  Auditory localization performance was determined in  virtual \nauditory space (VAS).  Psychoacoustical experiments were conducted in \nwhich the amplitude spectra of the sound stimulus was  varied indepen(cid:173)\ndentlyat each ear while preserving the normal timing cues, an impossibil(cid:173)\nity in the free-field environment. Virtual auditory noise stimuli were gen(cid:173)\nerated over earphones for a specified target direction such that there was \na \"false\" flat  spectrum at the left eardrum.  Using the subject's HRTFs, \nthe sound spectrum at the right eardrum was then adjusted so that either \nthe  true right  monaural  spectral  cue  or the  true  interaural  spectral  cue \nwas preserved.  All  subjects showed systematic mislocalizations in both \nthe true right and true interaural spectral conditions which was absent in \ntheir control localization performance. The analysis of the different cues \nalong with the subjects' localization responses suggests there are signif(cid:173)\nicant differences in the use of the monaural and interaural spectral cues \nand that the auditory system's reliance on the spectral  cues varies with \nthe sound condition. \n\n1  Introduction \n\nHumans are remarkably accurate in their ability to localize transient, broadband noise, an \nability with obvious evolutionary advantages. The study of human auditory localization has \na considerable and rich history (recent review [I]) which demonstrates that there are three \ngeneral classes of acoustical  cues involved in  the localization process:  (1) interaural time \ndifferences, ITDs; (2) interaurallevel differences, ILDs; and (3) the spectral cues resulting \n\n\fSpectral Cues in Human Sound Localization \n\n769 \n\nfrom  the auditory periphery.  It is  generally accepted that  for  humans,  the  lTD and  ILD \ncues  only specify  the  location of the  sound  source to  within  a  \"cone of confusion\"  [I], \ni.e.,  a  locus of points approximating the surface of a cone symmetric with  respect to the \ninteraural axis.  It remains, therefore, for the localization system to extract a more precise \nsound source location from the spectral cues. \n\nThe utilization of the outer ear spectral cues during sound localization has been analyzed \nboth as a statistical estimation problem, (e.g., [2]) and as optimization problem, often using \nneural  networks,  (e.g.,  [3]).  Such  computational models show that sufficient localization \ninformation is  provided by the spectral  cues to  resolve the cone of confusion ambiguity \nwhich corroborates the psychoacoustical evidence.  Furthermore,  it  is  commonly argued \nthat the interaural spectral cue, because of its natural robustness to level and spectral vari(cid:173)\nations, has advantages over the monaural spectral cues alone.  Despite these observations, \nthere is still considerable contention as to the relative role or contribution of the monaural \nversus the interaural spectral cues. \n\nIn this study,  each subject's spectral cues were characterized by measuring their head re(cid:173)\nlated transfer functions (HRTFs)  for 393  evenly distributed positions in space.  Measure(cid:173)\nments were carried out in an  anechoic chamber and were made for both ears simultane(cid:173)\nously using a \"blocked ear\" technique [I].  Sounds filtered with the HRTFs and played over \nearphones,  which  bypass the acoustical  filtering  of the outer ear,  result in the illusion of \nfree-field sounds which is known as virtual auditory space (VAS). The HRTFs were used to \ngenerate virtual sound sources in which the spectral cues were manipulated systematically. \nThe  recorded HRTFs  along  with  the  Glasberg and  Moore cochlear model  [4]  were  also \nused  to  generate neural  excitation patterns (frequency representations of the sound stim(cid:173)\nulus  within  the auditory nerve) which  were used  to  estimate the different cues available \nto the subject during the localization process.  Using this analysis,  the interaural  spectral \ncue was  characterized and the different localization cues have been correlated with each \nsubjects' VAS  localization responses. \n\n2  VAS Sound Localization \n\nThe sound localization performance of four normal hearing subjects was examined in VAS \nusing  broadband white  noise (300 - 14  000  Hz).  The stimuli  were filtered  under three \ndiffering spectral conditions.  (1) control:  stimuli were filtered with spectrally correct left \nand  right  ear  HRTFs  for  a  given  target  location,  (2)  veridical  interaural:  stimuli  at  the \nleft ear were made spectrally flat  with an appropriate dB  sound level  for the given target \nlocation,  while the stimuli at  the right ear were spectrally shaped to preserve the correct \ninteraural  spectrum,  (3)  veridical  right  monaural:  stimuli  at  the  left  ear were  spectrally \nflat  as  in  the  second  condition,  while the  stimuli  at  the  right  ear were  filtered  with  the \ncorrect HRTF for the given target location, resulting in an inappropriate interaural spectral \ndifference.  For each condition, a minimum-phase filter spectral approximation was made \nand  the  interaural  time  difference was  modeled  as  an  all-pass  delay  [5].  Sounds  were \npresented at approximately 70 dB SPL and with duration 150 ms (with 10 ms raised-cosine \nonset and offset ramps).  Each subject performed five trials at each of 76 test positions for \neach stimulus condition.  Detailed sound localization methods can be found in [1].  A short \nsummary is presented below. \n\n2.1  Sound Localization Task \n\nThe human  localization  experiments were  carried  out  in  a  darkened  anechoic  chamber. \nVirtual auditory sound stimuli were presented using earphones (ER-2, Etymotic Research, \nwith  a  flat  frequency  response,  within  3  dB,  between  200-16 000  Hz).  The  perceived \nlocation of the virtual sound source was indicated by the subject pointing hislher nose in \n\n\f770 \n\nC.  T.  lin, A.  Corderoy, S.  Carlile and A.  v.  Schaik \n\nthe  direction  of the  perceived source.  The subject's  head orientation and  position  were \nmonitored using an electromagnetic sensor system (Polhemus, Inc.). \n\n2.2  Human Sound Localization Performance \n\nThe sound localization performance of two subjects in  the three different stimulus condi(cid:173)\ntions are shown in Figure 1. The pooled data across 76 locations and five trials is presented \nfor both the left (L) and right (R) hemispheres of space from the viewpoint of an outside \nobserver. The target location is shown by a cross and the centroid of the subjects responses \nfor each location is shown by a black dot with the standard deviation indicated by an ellipse. \nFront-back confusions are plotted, although, they were removed for calculating the standard \ndeviations.  The subjects localized the control broadband sounds accurately (Figure 1a). In \ncontrast, the subjects demonstrated systematic mislocalizations for both the veridical inter(cid:173)\naural and veridical monaural spectral conditions (Figures  I b,c).  There is clear pulling of \nthe localization responses to particular regions of space with evident intersubject variations. \n\n(8)  Subject 1: Broadband Control \n\nSubject 2:  Broadband Control \n\nEllipse:  Standard Deviation \n\nL:i;;:~ses \n~JC~~ \n~~i.iif. \nR  L .,.c._:.  e'\u00b7\u00b7 R \n;::\u00b7 . \n\nSubject 2:  Veridical  Interaural Spectrum \n\n~I~;>\n\n.-.  ';'-\"'\" , \n\n. ~...:.. ..\" \n\n,':.,;':::::::.><\" \n\n. \n\n',.\n\n' \n\n\" \n-, .\n\n/...., \n\n\u2022\":':\"$. '~  R \n\n. .\u2022. ~......  .t \n\n.~ \n\n. \n\n' \n\n' ~ i\n\n~ \n\n\u2022 \n\n~:  ' \n\n\u2022 ...... ~ \u2022 .;;.:J;...;,1\"\" \n\nSubject 2: Veridical Right Monaural Spectrum \n\nL~~R \nV~ \n\n(b)  Subject 1: Veridicallnteraural Spectrum \n\nL \n\n(c)  Subject 1: Veridical Right Monaural Spectrum \n\nLe \n\n_ \n\u00b7  .... -.L \n\n- ' . \n\n.;.' \n.,. \n~~.j;~;\" \n\n~l~.  :.:. \n\n'.. \n. \n\n. \n... \n\n\" \n\n, \n\nFigure  1:  Localization  performance for  two  subjects  in  the  three  sound conditions:  (a) \ncontrol broadband; (b) veridical interaural; (c) veridical monaural.  See text for details. \n\n3  Extraction of Acoustical Cues \n\nWith accurate measurements of each individual's outer ear filtering,  the different acousti(cid:173)\ncal cues can be compared with human localization performance on an individual basis.  In \norder to  extract the different acoustical  cues in  a  biologically plausible manner,  a model \nof peripheral auditory processing was used.  A virtual source sound stimulus was prepared \nas  described  in  Secion  2  for a  particular target location.  The stimulus  was  then filtered \nusing a cochlear model based on the work of Glasberg and Moore [4].  This cochlear model \nconsisted of a set of modified rounded-exponential auditory filters.  The width  and shape \nof the  auditory filters  change as  a  function  of frequency  (and  sound  level)  in  a  manner \n\n\fSpectral Cues in Human Sound Localization \n\n771 \n\nconsistent with the known physiological and psychophysical data.  These filters  were log(cid:173)\narithmically spaced on the  frequency axis with a total  of 200 filters  between 300 Hz and \n14  kHz.  The cochlea's compressive non-linearity was  modelled mathematically  using  a \nlogarithmic function.  Thus the logarithm of the output energy ofa given filter indicated the \namount of neural activity in that particular cochlear channel. \n\nThe relative activity across the different cochlear channels was representative of the neu(cid:173)\nral  excitation pattern (EP) along the auditory nerve and it  is  from  this  excitation pattern \nthat the different spectral cues were estimated.  For a given location, the left and right EPs \nthemselves represent the monaural spectral cues. The difference in the total energy (calcu(cid:173)\nlated as the area under the curve) between the left and right EPs was taken as a measure \nof the interaural  level  difference and the  interaural  spectral  shape cue was calculated as \nthe difference between the left and right EPs.  The fourth  cue,  interaural time difference, \nis  a measure of the time lag between the signal  in  one ear as  compared to  the other and \ndepends principally upon the geometrical relationship between the sound source and  the \nhuman subject.  This time delay was calculated using the acoustical impulse response for \nboth ears as measured during the HRTF recordings. \n\n4  Correlation of Cues and Location \n\nFor each stimulus condition and location, the acoustical cues were calculated as described \nabove for all  393  HRTF  locations.  Locations at which a  given  cue correlates well  with \nthe stimulus cue for a particular target location were taken as analytical predictions of the \nsubject's response locations according to that cue.  As the spectral content of the signal is \nvaried, the cue(s) available may strongly match the cue(s) normally arising from locations \nother than the target location.  Therefore the aim  of this analysis is  to establish  for which \nlocations and stimulus conditions a given response most correlated with a particular cue. \n\nThe following analyses (using a Matlab toolbox developed by the authors) hinge upon the \ncalculation of \"cue correlation values\".  To a large extent, these calculations follow the ex(cid:173)\namples described by [6]  and are briefly described here.  For each stimulus condition and \ntarget location, the subject performed five localizations trials.  For each of the subject's five \nresponse locations, each possible cue was estimated (Section 3) assuming a flat-spectrum \nbroadband Gaussian white noise as the stimulus.  A mathematical quantity was then calcu(cid:173)\nlated which would give a measure of the similarity of the response location cues with the \ncorresponding stimulus cues.  The method of calculation depended on the cue and several \nalternative methods were tried.  Generally, for a given cue, these different methods demon(cid:173)\nstrated the same basic pattern and the term  \"cue correlation value\" has  been given to the \nmathematical quantity that was used to measure cue similarity. The methods are as follows. \n\nFor the  ITO  cue,  the  negative of the  absolute  value  of the  difference  between  possible \nresponse location ITDs and the stimulus ITO was used as the ITO cue correlation values \n(the more positive a value, the higher its correlation).  The ILD cue correlation value was \ncalculated in a similar fashion.  The cue correlation values for the left and right monaural \nspectral  cues  (in  this  case,  the  shape  of the  neural  excitation  pattern) was  calculated  by \ntaking the difference between the stimulus EP and the possible response location EPs and \nthen summing across frequency the variation of this difference about its mean value. For the \ninteraural spectral cue, the vector difference between the left and right EPs was calculated \nfor both  the stimulus  and the possible response  locations.  The dot product between the \nstimulus and the possible response location vectors gave the ISO cue correlation values. \n\nThe cue correlation values were normalized in order to facilitate meaningful comparisons \nacross the different acoustical cues.  Following Middlebrooks [6],  a \"z-score normalized\" \ncue value, for each response location corresponding to a given target location, was obtained \nby subtracting the mean correlation value (across all possible locations) and dividing by the \n\n\f772 \n\nC.  T.  Jin, A.  Corderoy,  S.  Carlile and A.  v.  Schaik \n\nstandard deviation.  For these new cue values, termed the cue z-score values, a score of 1.0 \nor greater indicates p'oocl correlation. \n\n5  Relationship between the ISD and the Cone-of-Confusion \n\nThe distribution of a given cue's z-score values around the sphere of space surrounding the \nsubject reveals the spatial directions for that cue that correlate best with the given stimulus \nand target location being examined. An examination of the interaural spectral cue indicated \nthat, unlike the other cues, the range of its cue z-score variation was relatively restricted on \nthe ipsilateral hemisphere of space relative to the sound stimulus (values on the ipsilateral \nside were approximately 1.0, those on the contralateral side, -1.0).  This was the first  indi(cid:173)\ncation of the more moderate variation of the ISO  cue across space as  compared with the \nmonaural spectral cues. \n\nCloser examination of the ISO cue revealed more detailed variational properties.  In order \nto  facilitate meaningful comparisons with the other cues, the ISO cue z-score values were \nadjusted such that all negative values (i.e., those values at locations generally contralateral \nto  the stimulus) were set to 0.0 and the cue z-score values recalculated.  The spatial  dis(cid:173)\ntribution of the rescaled ISO cue z-score values, as  compared with the cue z-score values \nfor the other cues, is shown in Figure 2.  The cone of confusion described by the ITO and \nILO is clearly evident (Fig. 2a,b) and it can be seen that the ISO cue is closely aligned with \nthese cues (Fig. 2c).  Furthermore, the ISO cue demonstrates significant asymmetry along \nthe front-back dimensions. These novel observations demonstrate that while previous work \n[3, 2]  indicates that the ISO cue provides sufficient information to determine a sound's 10-\neation exactly along the cone of confusion, the variation of the cue z-score values along \nthe cone is substantially less than that for the monaural spectral cues (Fig. 2d), suggest ;:-:3 \nperhaps that this acts to make the monaural spectral cue a more salient cue. \n\n(8) Interaural Time Difference \n40 \n\no \n\n(c) \n40 \n\no \n\nInteraural Spectrum \n\n1.5 \n\nc:  .40 \n,g \n0 \n~ (b)  Interaurai Level Difference \niIi  40 \n\n90 \n\n-40 \n\n180 \n\n0 \n\n90 \n\n180 \n(d) Ipsilateral Monaural Spectrum \n40 \n\no \n\n-40 \n\no \n\n90 \n\no \n\n-40 \n\n180 \n0 \nAzimuth \n\n90 \n\n1.1 \n\n180 \n\nFigure 2:  Spatial plot of the cue z-score values for a single target location (46 0  azimuth, \n20 0  elevation) and broadband sound condition.  Gray-scale color values indicate the cue's \ncorrelation in  different spatial directions with the stimulus cue at the target location.  (Z(cid:173)\nscore values for the ISO cue have been rescaled, see text.) \n\n6  Analysis of Subjects Responses using Cue Z-score Values \n\nA  given  cue's z-score values  for  the subject's responses across  all  76 test  locations and \nfive  trials  were averaged.  The mean and standard deviation are presented in  a  bar graph \n(Fig.  3).  The subjects' response locations correlate highly with the ITO and ILD cue and \n\n\fSpectral Cues in Human Sound Localization \n\n773 \n\nthe standard deviation of the correlation was low (Fig. 3a,b).  In other words, subjects' re(cid:173)\nsponses  stayed on the cone of confusion of the target location.  A similar analysis of the \nmore restricted, rescaled version of the interaural spectral cue shows that despite the spec(cid:173)\ntral manipulations and systematic mislocalizations, subject's were responding to locations \nwhich were highly correlated  with  the interaural  spectral  cue  (Fig.  3c).  The bar graphs \nfor the monaural spectral cues ipsilateral and contralateral to the target location show the \naverage correlation of the subjects' responses with these cues varied considerably with the \nstimulus condition (Fig. 3d-g) and to a lesser extent across subjects. \n\n(d)  Left Contralateral \n\n(e)  Right Contralateral \n\nSpectrum \n\n2 \n\nSpectrum \n\n2 \n\nControl  Veridical  VeridiCal \n\nBroadband  Interaural  Right Mona .... 1 1 \n=~= o \n\nILD \n\nControl  Veridical  Veridical \n\no  Control  Veridical  Veridical \n\nBroadband  Interaural  Right Monaural \n\nBroadband  InterauraJ  Right Monaural \n\n(f)  Left Ipsilateral \n\n(g)  Right Ipsilateral \n\nControl  VeridiCal  VeridiCal \n\nSpectrum \n\nBroadband  Interaural  Right Monaural  2 \n\nISD \n\nrn \nQ) \n:::l \n(ij  2 \n> \n~  1 \n8 \nIf  0 \nN \nQ) \n:::l \nU  2 \n\no \n\nControl  Veridical  Veridical \n\nBroadband  Interaural  Right Monaural \n\no \n\nControl  VeridiCal  VeridiCal \n\nBroadband  Interaurai  Right Monaural \n\n2 \n\no \n\nSpectrum \n\nControl  V9!idical  VeridiCal \n\nBroadband  I nteraural  Right Monaural \n\nFigure 3:  Correlation of the four subjects'  (indicated by different gray bars)  localization \nresponses with the different acoustical cues for each stimulus condition.  The bar heights \nindicates the mean cue z-score value, while the error bars indicate standard deviation. \n\n7  Spatial Plots of Correlation Regions \n\nAs  the localization responses  tended  to  lie along  the  cone of confusion,  the relative  im(cid:173)\nportance of the spectral cues along the cone of confusion was examined.  The correlation \nvalues for the spectral cues associated with the subjects' responses were recalculated as a \nz-score value using only the distribution of values restricted to the cone of confusion. This \ndemonstrates whether the  spectral  cues associated  with  the subjects'  response locations \nwere better correlated with the stimulus cues, than for any random location on the cone of \nconfusion. \n\nSpatial  plots of the recalculated response cue z-score values for  the spectral  cues of one \nsubject (similar trends across subjects), obtained for each stimulus location and across the \nthree different sound conditions,  is  shown  in  Figure 4.  Spatial regions of both high and \nlow correlation are evident that vary with the stimulus spectrum. The z-score values for the \nISD cue shows greater bilateral correlation across space in the veridical interaural condi(cid:173)\ntion (Fig. 4d) than for the veridical monaural condition (Fig. 4g), while the right monaural \nspectral cue demonstrates higher correlation in the right hemisphere of space for the veridi(cid:173)\ncal monaural condition (Fig.  4i) as opposed to the veridical interaural condition (Fig.  4t). \nThis result (although not  surprising) demonstrates that the auditory system  is  extracting \ncues to source location in a manner dependent on the input sound spectrum and in a man(cid:173)\nner consistent with the spectral infonnation available in the sound spectrum.  Figures 4e,h \nclearly demonstrate that the flat sound spectrum in the left ear was strongly correlated with \nand influenced the subject's localization judgements for specific regions of space. \n\n\f774 \n\nC.  T.  lin, A.  Corderoy,  S.  Carlile and A.  v.  Schaik \n\nBroadband \n\n(a) \n\nVeridical Interaural \n\n(d) \n\nVeridical  Monaural \n\n(9) \n\nc: \n.Q \n\n~ iIi \n\n40 \na \n\n-40 \n\u00b7180 \n\n40 \n\n. \n\na \n(1) \n\n180 \n\n\".  , \n\n'~'t;'  1 \n\na \n(e) \n\n180 \n\n40 \na \n\n-40 \n\u00b7180 \n\n40 \na \n\nf \n\n., \n\u00b7180 \n\n~ \na \n\n4 \n\n\u00b740 \n\u00b7180 \n\n180 \n\nAzimuth \n\n.5 \n\n\u00a7 \n.~ \nOl(cid:173)\n::J  c: \n-\n0 \nco  u \n1  >(cid:173)\nOl  0 \n~ Ol 8  c: \nI/)  0 \n,  u \n0.5  N  0 \nOl(cid:173)\n::J  Ol \n() .~ \n\n15 g \n\n(h) \n\na \n(i) \n\no \n\nE \n-=:2 \nOl-\n-Ial \n\nCl. \nU) \n\n180 \n\n180 \n\nFigure 4:  Spatial plot of the spectral cue z-score values for  one subject's localization re(cid:173)\nsponses across the three different sound conditions. \n\n8  Conclusions \n\nThe correlation of human sound localization responses with the available acoustical cues \nacross three spectrally.. different sound conditions has provided insights into the human au(cid:173)\nditory system and its integration of cues to produce a coherent percept of spatial location. \nThese data suggest an  interrelationship between the interaural  spectral  cue and the cone \nof confusion.  The ISO  cue  is  front-back  asymmetrical  along  the  cone  and  its  cue cor(cid:173)\nrelation  values vary more moderately as  a  function  of space than  those of the monaural \nspectral  cues.  These data shed light on the relative role and importance of the interaural \nand monaural spectral cues. \n\nAcknowledgments \n\nThis research was supported by the ARC, NHMRC, and Dora Lush Scholarship to CJ. \n\nReferences \n\n[1]  S.  Carlile,  Virtual auditory space:  Generation and applications.  New York:  Chapman and Hall, \n\n1996. \n\n[2]  R.  O.  Duda, \"Elevation dependence of the interaural transfer function,\"  in Binaural and spatial \nhearing in real and virtual environments (R. H. Gilkey and T.  R.  Anderson, eds.), ch. 3, pp. 49-\n75, Mahwah, New Jersey:  Lawrence Erlbaum Associates, 1997. \n\n[3]  J.  A.  Janko,  T.  R.  Anderson,  and R.  H.  Gilkey, \"Using neural networks to evaluate the viability \nof monaural and interaural cues for sound localization,\" in Binaural and Spatial Hearing in real \nand virtual environments (R. H. Gilkey and T. R. Anderson, eds.), ch. 26, pp. 557-570, Mahwah, \nNew Jersey:  Lawrence Erlbaum Associates,  1997. \n\n[4]  B. Glasberg and B. Moore, \"Derivation of auditory filter shapes from notched-noise data,\" Hear(cid:173)\n\ning Research, vol. 47, no.  1-2, pp.  103-138,  1990. \n\n[5]  F.  Wightman and D.  Kistler, \"The dominant role oflow-frequency interaural time differences in \n\nsound localization,\" 1. Acoust. Soc.  Am., vol. 91, no.  3, pp.  1648-1661,  1992. \n\n[6]  J.  Middlebrooks, \"Narrow-band sound localization related to  external ear acoustics,\" 1.  Acoust. \n\nSoc. Am., vol.  92, no.  5, pp.  2607-2624,  1992. \n\n\f", "award": [], "sourceid": 1731, "authors": [{"given_name": "Craig", "family_name": "Jin", "institution": null}, {"given_name": "Anna", "family_name": "Corderoy", "institution": null}, {"given_name": "Simon", "family_name": "Carlile", "institution": null}, {"given_name": "Andr\u00e9", "family_name": "van Schaik", "institution": null}]}