{"title": "A Novel Channel Selection System in Cochlear Implants Using Artificial Neural Network", "book": "Advances in Neural Information Processing Systems", "page_first": 910, "page_last": 916, "abstract": null, "full_text": "A  Novel Channel Selection System in \n\nCochlear Implants  Using Artificial Neural \n\nNetwork \n\nMarwan  A.  Jabri  &  Raymond J. Wang \n\nSystems Engineering  and Design Automation Laboratory \n\nDepartment of Electrical  Engineering \n\nThe University of Sydney \n\nNSW 2006, Australia \n\n{marwan,jwwang}Osedal.usyd.edu.au \n\nAbstract \n\nState-of-the-art  speech  processors  in  cochlear  implants  perform \nchannel selection  using  a spectral maxima strategy.  This strategy \ncan  lead  to  confusions  when  high  frequency  features  are  needed \nto discriminate between  sounds.  We  present  in this paper  a  novel \nchannel selection strategy based upon pattern recognition which al(cid:173)\nlows  \"smart\" channel selections to be made.  The proposed strategy \nis  implemented using  multi-layer  perceptrons  trained  on  a  multi(cid:173)\nspeaker labelled speech database.  The input to the network are the \nenergy coefficients of N  energy channels.  The output of the system \nare  the indices  of the M  selected  channels. \nWe  compare  the  performance  of our  proposed  system  to  that  of \nspectral maxima strategy, and show that our strategy can produce \nsignificantly better results. \n\n1 \n\nINTRODUCTION \n\nA cochlear  implant is  a device used  to provide the sensation of sound to those who \nare profoundly deaf by means of electrical stimulation of residual auditory neurons. \nIt generally  consists  of a  directional  microphone,  a  wearable  speech  processor,  a \nhead-set transmitter and an implanted receiver-stimulator module with an electrode \n\n\fA  Novel Channel  Selection System in  Cochlear Implants \n\n911 \n\narray which all together provide an electrical representation  of the speech  signal to \nthe residual nerve fibres  of the  peripheral auditory system  (Clark  et  ai,  1990). \n\nBrain \n\nElectrode \n\nArray \n\nFigure  1:  A simplified schematic diagram ofthe cochlear  implants \n\nA simplified schematic diagram of the cochlear implants is shown in Figure 1.  Speech \nsounds are picked up by the directional microphone and sent to the speech processor. \nThe speech  processor  amplifies, filters  and digitizes  these  signals,  and  then selects \nand codes  the appropriate sound information.  The coded  signal contains  informa(cid:173)\ntion as to which electrode  to stimulate and the intensity level  required  to generate \nthe appropriate sound sensations.  The signal is then sent to the receiver/stimulator \nvia the transmitter coil.  The receiver/stimulator delivers electrical  impulses to the \nappropriate  electrodes  in  the  cochlea.  These  stimulated  electrodes  then  directly \nactivate the hearing  nerve in  the  inner ear, creating the  sensation of sound,  which \nis  then  forwarded  to  the  brain for  interpretation.  The  entire  process  happens  in \nmilliseconds. \n\nFor multi-channel cochlear implants, the task of the speech  processor is to compute \nthe  spectral  energy  of the  electrical  signals  it  receives,  and  to  quantise  them  into \ndifferent levels.  The energy spectrum is commonly divided into separate bands using \na filter  bank of N  (typically 20) bandpass filters with centre frequencies ranging from \n250  Hz  to 10 KHz.  The bands of energy  are allocated t.o  electrodes  in the patient's \nimplant on a  one-to-one  basis.  Usually  the most-apical  bipolar electrode  pairs are \nallocated  to  the channels  in  tonotopic  order.  The  limitations of implant systems \nusually  require  only  a  selected  number of the quantised  energy  levels  to  be  fed  to \nthe implanted electrode  array  (Abbas,  1993;  Schouten,  1992). \n\nThe state-of-the-art speech  processor  for  multi-channel implants performs  channel \nselection using spectral  maxima strategy  (McDermott et  ai,  1992;  Seligman &  Mc(cid:173)\nDermott,  1994) .  The  maxima strategy  selects  the  M  (about  6)  largest  spectral \nenergy  of the  frequency  spectrum  as  stimulation  channels  from  a  filter  bank  of \nN  (typically  20)  bandpass.  It  is  believed  that  compared  to  other  channel  selec(cid:173)\ntion  techniques  (FOF2,  FOF1F2,  MPEAK  ...  ),  the maxima strategy  increases  the \namount of spectral information and improves the speech  perception and recognition \nperformance. \n\nHowever,  maxima strategy  relies  heavily on the  highest  energies.  This often leads \nto  the  same  levels  being  selected  for  different  sounds,  as  the  energy  levels  that \ndistinguish  them  are  not  high  enough  to  be  selected.  For  some  speech  signals, \n\n\f912 \n\nM.  A. JABRI, R.  J.  WANG \n\nit  does  not  cater  for  confusions  and  cannot  discriminate  between  high  frequency \nfeatures. \n\nWe  present  in  this  paper  Artificial  Neural  Networks  (ANN)  techniques  for  imple(cid:173)\nmenting  \"smart\"  channel selection for  cochlear  implant systems.  The input to the \nproposed  channel  selection  system  consists  of  the  energy  coefficients  (18  in  our \nexperiments)  and  the  output the  indices  of the  selected  channels  (6  in  our  exper(cid:173)\niments).  The neural  network  based  selection  system is  trained  on a  multi-speaker \nlabelled speech  and has  been  evaluated on a separated  multi-speaker database not \nused in the training phase.  The most important feature of our ANN  based channel \nselection system is  its ability  to select  the  channels for  stimulation on the basis of \nthe  overall  morphology  of the  energy  spectrum  and  not  only  on  the  basis  of the \nmaximal energy values. \n\n2  THE PATTERN  RECOGNITION  BASED  CHANNEL \n\nSELECTION STRATEGY \n\nSpeech is the most natural form of human communication.  The speech information \nsignal can be divided into phonemes, which share some common acoustic properties \nwith one  another for  a short  interval of time.  The phonemes  are  typically divided \ninto  two  broad  classes:  (a)  vowels,  which  allow  unrestricted  airflow  in  the  vocal \ntract,  and (b)  consonants, which restrict  airflow at some point and are weaker than \nvowels.  Different  phonemes  have  different  morphology  in  the  energy  spectrum. \nMoreover, for  different speakers and different  speech sentences,  the same phonemes \nhave  different  energy  spectrum  morphologies  (Kent  &  Read,  1992).  Therefore, \nsimple methods to select  some of the most important channels for  all the phoneme \npatterns will not perform as good as  the method that considers the spectrum in its \nentirety. \n\nThe existing maxima strategy only refers  to the spectrum  amplitudes found  in  the \nentire  estimated spectrum  without  considering  the  morphology.  Typically several \nof the maxima results  can  be  obtained from  a  single spectral peak.  Therefore,  for \nsome phoneme  patterns,  the selection  result  is  good enough  to represent  the  orig(cid:173)\ninal phoneme.  But  for  some  others,  some important features  of the  phoneme  are \nlost.  This usually happens  to those  phonemes with important features  in the high \nfrequency  region.  Due to the low  amplitude of the  high frequency  in the spectrum \nmorphology, maxima methods are not capable to extract those high frequency  fea(cid:173)\ntures.  The  relationship  between  the  desired  M  output  channels  and  the  energy \nspectrum patterns is complex, and depending on the conditions, may be influenced \nby many factors.  As mentioned in the Introduction, channel selection methods that \nmake use  of local  information only in  the  energy  spectrum  are  bound  to produce \nchannel sub-sets  where  sounds  may be confused.  The confusions can be  reduced  if \n\"global\"  information of the energy spectrum is used  in the selection process. \n\nThe channel selection  approach  we  are  proposing  makes use  of the  overall energy \nspectrum.  This is  achieved  by  turning  the  selection  problem  into  that  of a  spec(cid:173)\ntrum morphology pattern recognition one and hence,  we  call our approach Pattern \nRecognition  based  Channel Selection  (PRCS). \n\n\fA Novel Channel Selection System in Cochlear Implants \n\n913 \n\n2.1  PRCS  STRATEGY \n\nThe PRCS strategy  is  implemented using  two  cascaded  neural networks  shown in \nFigure 2: \n\n\u2022  Spectral  morphological classifier:  Its inputs  are  the  spectrum  energy  am(cid:173)\n\nplitudes  of all  the channels  and its outputs all  the  transformations of the \ninputs.  The  transformation  between  input  and  out.put  can  be  seen  as  a \nrecognition,  emphasis,  and/or decaying  of the  inputs.  The consequence  is \nthat some inputs are  amplified and some decayed,  depending  on  the  mor(cid:173)\nphology of the spectrum.  The classifier performs  a  non-linear mapping . \n\n\u2022  M  strongest of N  classifier:  It receives the output of morphological classifier \n\nand applies a  M  strongest selection  rule. \n\n\u2022 \u2022 \n\u2022 \n\n- - - - - Labeia \n\nMStrongaat \n\nofN \n\nCIanIf.., \n\nC21 .. 'IR. ---(cid:173)\n\nSpectral \n\nMorphological \n\nCI ..... .., \n\nFigure 2:  The pattern recognition based  channel selection architecture \n\n2.2  TRAINING  AND  TESTING  DATA \n\nThe  most difficult  task in developing the proposed  PRCS  is  to set  up the labelled \ntraining and testing data for  the spectral morphological classifier. \n\nThe training and testing  data sets  have  been constructed  using  the process  shown \nin Figure 3. \n\nHlmmlng \n18Ch8nnela \n-\nWindow +  r-- Quantlsatlor \n& scaling \n128 FFT \n\n\" \nTraining \nr-- &Teatlng \n' -\nFigure 3:  The process  of generating  training and testing sets \n\nChlinnel \nlabelling \n\nSets \n\n-\n\nr \n\nThe sounds in  the  data sets  are  speech  extracted from  the DARPA TIMIT  multi(cid:173)\nspeaker speech  corpus  (Fisher  et ai,  1987)  which contains a total of 6300 sentences, \n10 sentences spoken by each of 630 speakers.  The speech signal is sampled at 16KHz \nrate  with  16  bit precision.  As  the  speech  is  nonstationary,  to produce  the  energy \nspectrum versus  channel numbers,  a short-time speech  analysis method is used. \n\nThe  Fast  Fourier Transform with  8ms smooth Hamming window  technique  is  ap(cid:173)\nplied to yield the energy spectrum .  The hamming window has the shape of a raised \n\n\f914 \n\ncosine  pulse: \n\nM.  A. JABRI, R. J.  WANG \n\nh( n)  =  {  ~.54 - 0.46 cos (J~n. ) \n\nfor  0  ~ n  ~ N-l \notherwise \n\nThe  time frame  on  which  the  speech  analysis  is  performed  is  4ms  long  and  the \nsuccessive  time frame windows  overlap by  50%. \n\nUsing  frequency  allocations  similar  to  that  used  in  commercial  cochlear  implant \nspeech  processors,  the frequency  range in the spectrum is  divided into 18 channels \nwith each channel having  the center frequencies  of 250,  450,  650,  850  1050,  1250, \n1450,  1650,  1895,  2177,  2500,  2873,  3300,  3866,  4580,  5307,  6218  and  7285Hz \nrespectively.  Each  energy  spectrum  from  a  time frame  is  quantised  into these  18 \nfrequency  bands.  The energy  amplitude for  each  level  is  the sum of the amplitude \nvalue of the energy for  all  the frequency  components in the level. \n\nThe quantised  energy spectrum is  then labelled using  a graphics  based  tool, called \nLABEL,  developed  specially  for  this  application.  LABEL  displays  the  spectrum \npattern  including  the  unquantised  spectrum,  the  signal  source,  speaker's  name, \nspeech sentence,  phoneme, signal pre-processing method and FFT results.  All these \ninformation assists  labelling experts  to  allocate  a  score  (1  to  18)  to each  channel. \nThe score reflects  the importance of the information provided by each of the bands. \nHence,  if six  channels  are  only  to  be  selected,  the  channels  with  the  score  1  to  6 \ncan be used  and  are highlighted.  The labelling is necessary  as  a  supervised  neural \nnetwork training method is being  used. \n\nA  total  of 5000  energy  spectrum  patterns  have  been  labelled.  They  are  from  20 \ndifferent  speakers  and  different.  spoken  sentences.  Of the  5000  example  patterns, \n4000 patterns are allocated for  training and  1000 patterns for  testing. \n\n3  EXPERIMENTAL RESULTS \n\nWe  have  implemented  and  tested  the  PH.CS  system  as  described  above  and  our \nexperiments  show  that  it  has  better  performance  than  channel  selection  systems \nused  in present  cochlear implant processors. \n\nThe PRCS system is effectively constructed as a multi-module neural network using \nMUME  (Jabri  et  ai,  1994).  The back-propagation algorithm in  an on-line mode is \nused  to  train  the  MLP.  The  training  patterns  input  components  are  the  energy \namplitudes of the  18  channels  and  the  teacher  component  consists  of a  \"I\"  for  a \nchannel  to  be selected  and  \"0\"  for  all others.  The  MLP  is  trained for  up  to 2000 \nepochs  or when a  minimum total mean squared error is  reached.  A learning rate 7J \nof 0.01  is  used  (no weight  decay). \n\nWe show  the average performance of our PRCS in Table  1 where  we  also show the \nperformance  of a  leading  commercial spectral  maxima strategy  called  SPEAK  on \nthe same test set.  In the first  column of this table we  show the number of channels \nthat matched out of the 6 desired  channels.  For example, the first  row  corresponds \nto the case where all 6 channels match the desired 6 channels in the test data base, \nand so on.  As Table 1 shows, the PRCS produces a significantly better performance \nthan the commercial strategy on the speech  test  set. \n\nThe  selection  performance  to  different  phonemes  is  listed  in  Table  2.  It clearly \n\n\fA Novel Channel  Selection System in Cochlear Implants \n\n915 \n\nTable  1:  The  comparison of average  performance  between  commercial and  PRCS \nsystem \n\nII \n\nThe  Channel  Selections  from  the  two  different  methods \n\nPRCS results  Commercial technique results \n\nII \n\nFully matched \n5 matched \n4  matched \n3 matched \n2 matched \n1 matched \n\n22  % \n80  % \n98  % \n100  % \n100  % \n100  % \n\n4% \n25  % \n57  % \n93  % \n99  % \n100  % \n\nTable 2:  PRCS channel  selecting performance on different  phoneme patterns \n\nThe  P RCS results for  different  phoneme  patterns \n\n\\I \n\nII \n\nPhoneme \n\nStops \nFricatives \nNasals \nSemivowels &  Glides \nVowels \n\nFully matched  5 matched  4 matched  3 matched \n100 % \n100 % \n100  % \n100% \n100  % \n\n69  % \n66  % \n66  % \n79  % \n84  % \n\n19  % \n18  % \n14  % \n14  % \n25  % \n\n96  % \n92  % \n96  % \n95  % \n98  % \n\nshows that the  PRCS strategy can cater for the features of all the speech spectrum \npatterns. \n\nTo  compare  the  practical  performance  of the  PRes  with  the  maxima strategies \nwe  have  developed  a  direct  performance  test  system  which  allows  us  to  play  the \nsynthesized  speech  of the  selected  channels  through  post-speech  synthesizer.  Our \ntest  shows  that  the  PRCS  produces  more  intelligible  speech  to  the  normal  ears. \nSixteen different  sentences  spoken by sixteen people are  tested using  both maxima \nand PRCS methods.  It  is found  that the synthesized speech from  PRCS has much \nmore  high  frequency  features  than  that  of the  speech  produced  by  the  maxima \nstrategy.  All listeners who were  asked to take the test agreed  that the quality of the \nspeech  sound from  PRCS is  much better  than those from  the commercial maxima \nchannel  selection  system.  The  tape  recording  of the  synthesized  speech  will  be \navailable at the conference. \n\n4  CONCLUSION \n\nA  pattern recognition  based  channel selection  strategy  for  Cochlear  Implants has \nbeen  presented.  The strategy is  based  on a  18-72-18 MLP  strongest  selector.  The \nproposed  channel  selection  strategy  has  been  compared  to  a  leading  commercial \ntechnique.  Our simulation and  play  back  results  show  that  our machine  learning \nbased  technique produces significantly better channel selections. \n\n\f916 \n\nReference \n\nM.  A. JABRI, R. J. WANG \n\nAbbas,  P.  J.  (1993)  Electrophysiology.  \"Cochlear  Implants:  Audiological  Founda(cid:173)\ntions\"  edited  by R.  S.  Tyler,  Singular Publishing  Group,  pp.317-355. \n\nClark, G. M., Tong,  Y. C.& Patrick, J. F . (1990)  Cochlear Prosthesis.  Edi n borough: \nChurchill Living stone. \n\nFisher,  W.  M.,  Zue,  V.,  Bernstein,  J.  &  Pallett,  D.  (1987)  An  Acoustic-Phonetic \nData Base.  In  113th  Meeting  of Acoust Soc  Am, May  1987 \n\nJabri,  M.  A.,  Tinker,  E.  A.  &  Leerink,  L.  (1994)  MUME  - A  Multi-Net  Multi(cid:173)\nArchitecture  Neural  Simulation Environment.  \"Neural  Network  Simulation  Envi(cid:173)\nronments\",  J.  Skrzypek  ed.,  Kluwer  Academic Publishers. \n\nKent, R. D. & Read, C.  (1992)  The Acoustic Analysis of Speech.  Whurr Publishers. \n\nMcDermott,  H.  J.,  McKay,  C.  M.  & Vandali, A.  E.  (1992)  A new  portable sound \nprocessor for the University of Melbourne /  Nucleus Limited multielectrode cochlear \nimplant.  J. Acoust.  Soc.  Am.  91(6), June  1992, pp.3367-3371 \n\nSchouten, M.  E. H edited (1992) The Auditory Processing of Speech - From Sounds \nto Words.  Speech  Research  10,  Mouton  de  Groyter. \n\nSeligman,  P.  &  McDermott,  H.  (1994)  Architecture  of the  SPECTRA  22  Speech \nProcessor.  International  Cochlear  Implant,  Speech  and  Hearing  Symposium,  Mel(cid:173)\nbourne,  October,  1994,  p.254. \n\n\f", "award": [], "sourceid": 1159, "authors": [{"given_name": "Marwan", "family_name": "Jabri", "institution": null}, {"given_name": "Raymond", "family_name": "Wang", "institution": null}]}