{"title": "An Analog VLSI Model of Periodicity Extraction", "book": "Advances in Neural Information Processing Systems", "page_first": 738, "page_last": 746, "abstract": null, "full_text": "An Analog VLSI Model of \n\nPeriodicity Extraction \n\nAndre van Schaik \n\nComputer Engineering Laboratory \n\nJ03, University of Sydney, NSW 2006 \n\nSydney, Australia \n\nandre@ee.usyd.edu.au \n\nAbstract \n\nthat extracts \n\nThis paper presents an electronic system \nthe \nperiodicity of a sound. It uses three analogue VLSI building \nblocks: a silicon cochlea, two inner-hair-cell circuits and two \nspiking neuron chips. The silicon cochlea consists of a cascade of \nfilters. Because of the delay between two outputs from the silicon \ncochlea, spike trains created at these outputs are synchronous only \nfor a narrow range of periodicities. In contrast to traditional band(cid:173)\npass filters, where an increase in' selectivity has to be traded off \nagainst a decrease in response time, the proposed system responds \nquickly, independent of selectivity. \n\n1 \n\nIntroduction \n\nThe human ear transduces airborne sounds into a neural signal using three stages in \nthe inner ear's cochlea: (i) the mechanical filtering of the Basilar Membrane (BM), \n(ii) the transduction of membrane vibration into neurotransmitter release by the \nInner Hair Cells (IHCs), and (iii) spike generation by the Spiral Ganglion Cells \n(SGCs), whose axons form the auditory nerve. The properties of the BM are such \nthat close to the entrance of the cochlea (the base) the BM is most sensitive to high \nfrequencies and at the apex the BM responds best to low frequencies. Along the BM \nthe best-frequency decreases in an exponential manner with distance along the \nmembrane. For frequencies below a given point's best-frequency the response drops \noff gradually, but for frequencies above the best-frequency the response drops off \nrapidly (see Fig. 1 b for examples of such frequency-gain functions). \nAn Inner Hair Cell senses the local vibration of a section of the Basilar Membrane. \nThe intracellular voltage of an IHC resembles a half-wave-rectified version of the \nlocal BM vibration, low-pass filtered at I kHz. The IHC voltage has therefore lost \nit's AC component almost completely for frequencies above about 4 kHz. Well \nbelow this frequency, however, the lHC voltage has a clear temporal structure, \nwhich will be reflected in the spike trains on the auditory nerve. \n\nThese spike trains are generated by the spiral ganglion cells. These sacs spike with \na probability roughly proportional to the instantaneous inner hair cell voltage. \nTherefore, for the lower sound frequencies, the spectrum of the input waveform is \nnot only encoded in the form of an average spiking rate of different fibers along the \n\n\fAn Analog VLSI Model of Periodicity Extraction \n\n739 \n\ncochlea (place coding), but also in the periodicity of spiking of the individual \nauditory nerve fibers. It has been shown that this periodicity information is a much \nmore robust cue than the spatial distribution of average firing rates [I]. Some \nperiodicity information can already be detected at intensities 20 dB below the \nintensity needed to obtain a change in average rate. Periodicity information is \nretained at intensities in the range of 60-90 dB SPL, for which the average rate of \nthe majority of the auditory nerve fibers is saturated. Moreover, the positions of the \nfibers responding best to a given frequency move with changing sound intensity, \nwhereas the periodicity information remains constant. Furthermore, the frequency \nselectivity of a given fiber's spiking rate is drastically reduced at medium and high \nsound intensities. The robustness of periodicity information makes it likely that the \nbrain actually uses this information. \n\n2 Modelling periodicity extraction \n\nSeveral models have been proposed that extract periodicity information using the \nphase encoding of fibers connected to the same inner hair cell or that use the \nsynchronicity of firing on auditory nerve fibers connected to different inner hair \ncells (see [2] for 4 examples of these models). The simplest of the phase encoding \nschemes correlate the output of the cochlea at a given position with a delayed \nversion of itself. It is easy to see that for pure tones, the comparison sin(2 1t f t) = \nsin(2 1t f (t - ~\u00bb is only true for frequencies that are a multiple of 1I~, i.e., for these \nfrequencies the signals are in perfect synchrony and thus perfectly correlated. We \ncan adapt the delay ~ to each cochlear output, so that I/~ equals the best frequency \nof that cochlear output. In this case higher mUltiples of I/~ will be suppressed due \nto the very steep cut-off of the cochlear filters for frequencies above the best \nfrequency. Each synchronicity detector will then only be sensitive to the best \nfrequency of the filter to which it is connected. If we code the direct signal and the \ndelayed signal with two spike trains, with one spike per period at a fixed phase \neach, it becomes a very simple operation to detect the synchronicity. A simple \ndigital AND operator will be enough to detect overlap between two spikes. These \nspikes will overlap perfectly when f = 1I~, but some overlap wi II still be present for \nfrequencies close to 1I~, since the spikes have a finite width. The bandwidth of the \nAND output can thus be controlled by the spike width. \nIt is possible to create a silicon implementation of this scheme using an artificial \ncochlea, an IHe circuit, and a spiking neuron circuit together with additional \ncircuits to create the delays. A chip along these lines has been developed by John \nLazzaro [3] and functioned correctly. A disadvantage of this scheme, however, is \nthe fact that the delay associated with a cochlear output has to be matched to the \ninverse of the best frequency of that cochlear output. For a cochlea whose best \nfrequency changes exponentially with filter number in the cascade from 4 kHz (the \nupper range of phase locking on the auditory nerve) to 100 Hz, we will have to \ncreate delays that range from 0.25 ms to 10 ms. In the brain, such a large variation \nin delays is unlikely to be provided by an axonal delay circuit because it would \nrequire an excessively large variation in axon length. \nA possible solution comes from the observation that the phase of a pure tone of a \ngiven frequency on the basilar membrane increases from base to apex, and the phase \nchanges rapidly around the best frequency. The silicon cochlea, which \nis \nimplemented with a cascade of second-order low-pass filters (Fig. 1 a), also \nfunctions as a delay line, and each filter adds a delay which corresponds to 1t/2 at \nthe cut-off frequency of that filter. If we assume that filter i and filter i-4 have the \nsame cut-off frequency (which is not the case), the delay between the output of both \nfilters will correspond to a full period (21t) at the cut-off frequency. \n\n\f740 \n\nA. v. Schaik \n\n.......... \n\n20 \ngain \n(dB)o \n\n-20 \n\n1 _-\n\n\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7 \u00b7\u00b7\u00b7,\u00b7,\u00b7\u00b7\u00b7 \u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7i\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7,\u00b7\u00b7\u00b7\u00b7\u00b7_\u00b7\u00b7\u00b7\u00b7 \n___ ,i-4 \n\\ \n\\ \n'\\ \\ \n\n/ ' \n\n\\ \n\n\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7-\u00b7-- Il\n\nl \n\nFigure 1: a) Part of a silicon cochlea. Each section contains a second-order \nlow-pass filter and a derivator; b) accumulated gain at output i and i-4; c) \nphase curves of the individual stages between output i and output i-4; d) \nproposed implementation of the periodicity extraction model. \n\nIn reality, the filters along the cochlea will have different cut-off frequencies, as \nshown in Fig. 1. Here we show the accumulated gain at the outputs i and i-4 (Fig. \n1 b), and the delay added by each individual filter between these two outputs (Fig. \n1 c) as a function of frequency (normalized to the cut-off frequency of filter i). The \nsolid vertical line represents this cut-off frequency, and we can see that only filter i \nadds a delay of 7t/2, and the other filters add less. However, if we move the vertical \nline to the right (indicated by the dotted vertical line), the delay added by each filter \nwill increase relatively quickly, and at some frequency slightly higher than the cut(cid:173)\noff frequency of filter i, the sum of the delays will become 27t (dashed line). At this \nfrequency neither filter i nor filter i-4 has maximum gain, but if the cut-off \nfrequency of both filters is not too different, the gain will stiII be high enough for \nboth filters at the correlator frequency to yield output signals with reasonable \namplitudes. \nThe improved model can be implemented using building blocks as shown in Fig. I d. \nEach of these building blocks have previously been presented (refer to [4] for \nadditional details). The silicon cochlea is used to filter and delay the signal, and has \nbeen adjusted so that the cut-off frequency decreases by one octave every twenty \nstages, so that the cut-off frequencies of neighboring filters are almost equal. The \nIHC circuit half-wave rectifies the signal in the implementation of Figure Id. The \nlow-pass filtering of the biological Inner Hair Cell can be ignored for frequencies \nbelow the approximately I kHz cut-off frequency of the cell. Since we limited our \nmeasurements to this range, the low-pass filtering has not been modeled by the \ncircuit. Two chips containing electronic leaky-integrate-and-fire neurons have been \nused to create the two spike trains. In the first series of measurements, each chip \ngenerates exactly one spike per period of the input signal. A final test will set the 32 \nneurons on each chip to behave more like biological spiral ganglion cells and the \neffect on periodicity extraction will be shown. A digital AND gate is used to \ncompare the output spikes of the two chips, and the spike rate at the output of the \nAND gate is the measure of activity used. \n\n3 Test results \n\nThe first experiment measures the number of spikes per second at the output of the \nAND gate as a function of input frequency, using different cochlear filter \ncombinations. Twelve filter pairs have been measured, each combining a filter \n\n\fAn Analog VLSI Model of Periodicity Extraction \n\n741 \n\noutput with the output of a filter four sections earlier in the cascade. The best \nfrequency of the filter with the lowest best frequency of the pairs ranged from \n200 Hz to 880 Hz. The results are shown in Fig. 2a. \n\n. __ .___ _ \n\n1.2 \n\n1S00 \n\na \n\nb \n\n0.8 \n\n~ \n\n7S0 \n\nQ. \n\nj \n\u00b7O~~~~LU~~4-~ __ ~~ ./ \n200 \n\n1000 \n\n1200 \n\n800 \n\n400 \n\n600 \n\n0.4 \n\no \n\n200 \n\nfrequency (Hz) \n\nfrequency (log scale) \n\n2000 \n\nFigure 2: a) measured output rate at different cochlear positions, and b) \nspike rate normalized to best input frequency, plotted on a log frequency \nscale. \n\nThe maximum spike rate increases approximately linearly with frequency; this is to \nbe expected, since we will have approximately one spike per signal period. Further(cid:173)\nmore the best response frequencies of the filters sensitive to higher frequencies are \nfurther apart, due to the exponential scaling of the frequencies along the cochlea. \nFinally, a given time delay corresponds to a larger phase delay for the higher \nfrequencies, so that the absolute bandwidth of the coincidence detectors, I.e., the \nrange of input frequencies to which they respond, is larger. When we normalize the \nspike rate and plot the curves on a logarithmic frequency scale, as in Fig. 2b, we see \nthat the best frequencies of the correlators follow the exponential scaling of the best \nfrequencies of the cochlear filters, and that the relative bandwidth is fairly constant. \n\n1200 , -.. ___ . _________ -,-_--.\".-,~ \nspike \n\na \n\n1----20mV \n_ .. ~, __ 30mV \n' \n~I';\"\" 'J\\ \n1----- 40mV \n!, \nI \n. \n! , \nI , , \n\u2022 \n: , \n\nI \n\nra te \n\n600 \n\no~ _____ ~J~'-~ ______ ~ \nSS8 6S1 744 837 9301023111612091302 \n\nInput frequency (Hz) \n\n1200 \n\n._ .... _ .. __ .. _. __ ._ ... _. __ . __ ..... _ ...... _____ ._._. \n\nb \n\nspike \nrate \n\n600 \n\n1----20mV ! \n__ 30mV \n' \n\nO~~~~~~_~~~~~ \n558 651 744 837 930 1023111612091302 \n\ncarrier frequency (Hz) \n\nFigure 3: Frequency selectivity for different input intensities. a) pure tones \nb) AM signals. \n\nUsing the same settings as in the previous experiment, the output spike rate of the \nsystem for different input amplitudes has been measured, using the cochlear filter \npair with best frequencies of 710Hz and 810Hz. In principle, the amplitude of the \ninput signal should have no effect on the output of the system, since the system only \nuses phase information. However, this is only true if the spikes are always created at \nthe same phase of the output signal of the cochlear filters, for instance at the peak, \nor the zero crossing. Fig. 3 shows however that the resulting filter selectivity shifts \nto lower frequencies for higher intensity input signals. \nThis is a result of the way the spikes are created on the neuron chip. The neurons \nhave been adjusted to spike once per period, but the phase at which they spike with \nrespect to the half-wave-rectified waveform depends on the integration time of the \nneuron, which is the time needed with a given input current to reach the spike \nthreshold voltage from the zero resting voltage. This time depends on the amplitude \nof the input current, which in tum is proportional to the amplitude of the input \nsignal. Since the amplitude gain of the two cochlear filters used is not the same, the \namplitude of the current input to the two neuron chips is different. Therefore, they \ndo not spike at the same phase with respect to their respective input waveforms. \nThis causes the frequency selectivity of the system to shift to lower frequencies with \nincreasing intensity. However, this is an artifact of the spike generation used to \n\n\f742 \n\nA. v. Schaik \n\nsimplify the system. On the auditory nerve, spikes arrive with a probability roughly \nproportional to the half-wave rectified wavefonn. The most probable phase for a \nspike is therefore always at the maximum of the wavefonn, independent of \nintensity. In such a system, the frequency selectivity will therefore be independent \nof amplitude. A second advantage of coding (at least half of) the wavefonn in spike \nprobability is that it does not assume that the input wavefonn is sinusoidal. Coding \na wavefonn with just one spike per period can only code the frequency and phase of \nthe wavefonn, but not its shape. A square wave and a sine wave would both yield \nthe same spike train. We will discuss the \"auditory-nerve-like\" coding at the end of \nthis section. \nTo test the model with a more complex waveform, a 930 Hz sine wave 100% \namplitude-modulated at 200 Hz generated on a computer has been used. The carrier \nfrequency was varied by playing the whole wavefonn a certain percentage slower or \nfaster. Therefore the actual modulation frequency changes with the same factor as \nthe carrier frequency. The results of this test are shown in Fig. 3b for three different \ninput amplitudes. Compared to the measurements in Fig. 3a, we see that the filter is \nless selective and centered at a higher input frequency. The shift towards a higher \nfrequency can be explained by the fact that the average amplitude of a half-wave \nrectified amplitude modulated signal is lower than in for a half-wave rectified pure \ntone with the same maximum amplitude. Furthennore, the amplitude of the positive \nhalf-cycle of the output of the IHC circuit changes from cycle to cycle because of \nthe amplitude modulation. We have seen that the amplitude of the input signal \nchanges the frequency for which the two spike trains are synchronous, which means \nthat the frequency which yields the best response changes from cycle to cycle with a \nperiodicity equal to the modulation frequency. This introduces a sort of \"roaming\" \nof the frequencies in the input signa], effectively reducing the selectivity of the \nfilters. Finally, because of the 100% depth of the amplitude modulation, the \namplitude of the input will be too low during some cycles to create a spike, which \ntherefore reduces the total number of spikes which can coincide. \n\nFig. 3b shows that this model detects periodicity and not spectral content. The \nspectrum of a 930 Hz pure tone 100% amplitude modulated at 200 Hz contains, \napart from a 930 Hz carrier component, components both at 730 Hz and 1130 Hz, \nwith half the amplitude of the carrier component. When the speed of the wavefonn \nplayback is varied so that the carrier frequency is either 765 Hz or 1185 Hz, one of \nthese spectral side bands will be at 930 Hz, but the system does not respond at these \ncarrier frequencies. This is explained by the fact that the periodicity of the zero \ncrossings, and thus of the positive half cycles of the IHC output, is always equal to \nthe carrier frequency. \n\nTraditional band-pass filters with a very high quality factor (Q) can also yield a \nnarrow pass-band, but their step response takes about ] .5Q cycles at the center \nfrequency to reach steady state. The periodicity selectivity of the synchronicity \ndetector shown in Fig. 3a corresponds to a quality factor of ] 4; a traditional band(cid:173)\npass filter would take about 2] cycles of the 930Hz input signal to reach 95% of it's \nfinal output value. Fig. 4 shows the temporal aspect of the synchronicity detection \nin our system. The top trace in this figure shows the output of the cochlear filter \nwith the highest best frequency (index i-4 in Fig. 1) and the spikes generated based \non this output. The second trace shows the same for the output of the cochlear filter \nwith the lower best frequency (index i in Fig. ]). The third trace shows the output of \nthe AND gate with the above inputs, which are slightly above its best periodicity. \nCoincidences are detected at the onset of the tone, even when it is not of the correct \nperiodicity, but only for the first one or two cycles. The bottom trace shows the \noutput of the AND gate for an input at best frequency. The system thus responds to \nthe presence of a pure tone of the correct periodicity after only a few cycles, \nindependent of the filters selectivity. \n\n\fAn Analog VLSI Model of Periodicity Extraction \n\n743 \n\nI--~ \nI \nb \n\n~ \nI \nJ I \n\nI \n\nI \ntime (ms) \n\nFigure 4: Oscilloscope traces of the temporal aspect of synchronicity \ndetection. The vertical scale is 20mY per square for the cochlear outputs, \nthe spikes are 5Y in amplitude. \n\nTo show this more dramatically, we have reduced the spike width to lOllS, to obtain \na high periodicity selectivity as shown in Fig. 5a. The bandwidth of this filter is \nonly 20 Hz at 930 Hz, equivalent to a quality factor of 46.5. A traditional filter with \nsuch a quality factor would only settle 70 cycles after the onset of the signal, \nwhereas the periodicity detector still settles after the first few cycles, as shown in \nFig. 5b. We can compare this result with the response of a classic RLC band-pass \nfilter with a 930 Hz center frequency and a quality factor of 46.5 as shown in Fig. 6. \nAfter 18 cycles of the input signal, the output of the band-pass filter has only \nreached 65% of its final value. Thresholding the RLC output could signal the \npresence of a periodicity faster, but it would then still respond very slowly to the \noffset of the tone as the RLC filter wi II continue ringing after the offset. \n\n- - -- - - -\n\n800 \nspike \nrate \n\n400 \n\n--, \n\nI \nI I \nI \n\nI \n\n0 \n830 \n\n980 \n880 \nInput frequency (Hzl \n\n930 \n\n1030 \n\n0 \n\n5 \n\n10 \n\n15 \n\ntime (ms) 20 \n\nFigure 5: a) Frequency selectivity with a lOlls spike width. b) Cochlear \noutput (top, 40 mY scale) and coincidences (bottom) for a signal at best \nfrequency. \n\ngain \n\n05 \n\n/ \n\n! \\ \n\\ \n\\, \n\nOL---______ ~--__ --~ \n\n\\JW\\MN\\N \n\n: \n\n~ \n\n65% \n\n830 \n\n880 \n980 \nInput frequency (Hzl \n\n930 \n\n1030 \n\no \n\n5 \n\n10 \n\n15 \n\ntime (ms) 20 \n\nFigure 6: Simulated response of the RLC band-pass filter. a) frequency \nselectivity, b) transient response (scale units are 40 mY). \n\nIn the previous experiments we simplified the model to use one spike per period in \norder to understand the principle behind the periodicity detection. However, we \nhave seen that this implementation leads to a shift in best periodicity with changing \namplitude, because the phase at which the 'single neuron' spikes changes with \nintensity. Now, we will change the settings to be more realistic, so that each of the \n32 neurons cannot spike at every period, and we will reduce the output gain of the \nIHC circuit so that the neurons receive less signal current, and thus have a lower \ninput SNR. The resulting spike distribution is a better simulation of the spike \ndistribution on the auditory nerve. This is shown in Fig. 7 for a group of 32 neurons \nstimulated by and IHC circuit connected to a single cochlear output. The bottom \ntrace shows the sum of spikes over the 32 neurons on an arbitrary scale. When we \n\n\f744 \n\nA. v. Schaik \n\nuse this spike distribution and repeat the pure-tone detection experiment of Fig. 3a \nat different input intensities, we obtain the curve of Fig. 7b. Indeed, in this case, the \nbest periodicity does not change; the curves are remarkably independent of input \nintensity. However, the selectivity curve is about twice as wide at the base as the \nones in Fig. 3a, but the slopes of the selectivity curve rise and fall much more \ngradually. This means that we can easily increase the selectivity of these curves by \nsetting a higher threshold, e.g., discarding spike rates below 70 spikes per second. \nBecause of the steep slopes in Fig. 3a such an operation would hardly increase the \nselectivity for that case. \n\n140 _ \u2022. __ . _____ .... _.,--_=:;-, \n\nspike \nrate \n\n70 \n\n____ 20\"'\" \n__ 30mV \n\n_ .. _ . 40\"\", \n\no \n\n10 \n\n15 \n\ntime (ms) 20 \n\nO~~~ __ ~~~ __ - __ ~ \n558 651 744 837 930 1023 11161209 1302 \n\nInput freque ncy (Hz) \n\nFigure 7: a) Cochlear output (top) and population average of the auditory \nnerve spikes (bottom); b) periodicity selectivity with auditory nerve like \nspike distribution. \n\n4 Conclusions \n\nIn \nthis paper we have presented a neural system for periodicity detection \nimplemented with three analogue VLSI building blocks. The system uses the delay \nbetween the outputs at two points along the cochlea and synchronicity of the spike \ntrains created from these cochlear outputs to detect the periodicity of the input \nsignal. An especially useful property of the cochlea is that the delay between two \npoints a fixed distance apart corresponds to a full period at a frequency that scales in \nthe same way as the best frequency along the cochlea, I.e., decreases exponentially. \n\nIf we always create spikes at the same phase of the output signal at each filter, or \nsimply have the highest spiking probability for the maximum instantaneous \namplitude of the output signal, then both outputs will only have synchronous spikes \nfor a certain periodicity, and we can easily detect this synchronicity with \ncoincidence detectors. This system offers a way to obtain very selective filters using \nspikes. Even though they react to a very narrow range of periodicities, these filters \nare able to react after only a few periods. Furthermore, the range of periodicities it \nresponds to can be made independent of input intensity, which is not the case with \nthe cochlear output itself. This clearly demonstrates the advantages of using spikes \nin the detection of periodicity. \n\nAcknowledgements \n\nThe author thanks Eric Fragniere, Eric Vittoz and the Swiss NSF for their support. \n\nReferences \n\n[1] Evans, \"Functional anatomy of the auditory system,\" in Barlow and MoHon (editors), \nThe Senses, Cambridge University Press, pp. 251-306, 1982. \n\n[2] Seneff, Shamma, Deng, & Ghitza, Journal of Phonetics, Vol. 16, pp. 55-123, 1988. \n\n[3] Lazzaro, \"A silicon model of an auditory neural representation of spectral shape.\" IEEE \nJournal of Solid-State Circuits, Vol. 26, No.5, pp. 772-777, 1991. \n\n[4] van Schaik, \"An Analogue VLSI Model of Periodicity Extraction in the Human Auditory \nSystem,\" to appear in Analog Integrated Circuits and Signal Processing, Kluwer, 2000. \n\n\fPART VI \n\nSPEECH, HANDWRITING AND SIGNAL \n\nPROCESSING \n\n\f\f", "award": [], "sourceid": 1781, "authors": [{"given_name": "Andr\u00e9", "family_name": "van Schaik", "institution": null}]}