{"title": "Distributional Population Codes and Multiple Motion Models", "book": "Advances in Neural Information Processing Systems", "page_first": 174, "page_last": 182, "abstract": null, "full_text": "Distributional Population Codes and \n\nMultiple Motion Models \n\nRichard S. Zemel \n\nUniversity of Arizona \n\nzemel@u.arizona.edu \n\nPeter Dayan \n\nGatsby Computational Neuroscience Unit \n\ndayan@gatsby.ucl.ac.uk \n\nAbstract \n\nMost theoretical and empirical studies of population codes make \nthe assumption that underlying neuronal activities is a unique and \nunambiguous value of an encoded quantity. However, population \nactivities can contain additional information about such things as \nmultiple values of or uncertainty about the quantity. We have pre(cid:173)\nviously suggested a method to recover extra information by treat(cid:173)\ning the activities  of the population of cells  as coding for  a com(cid:173)\nplete distribution over the coded quantity rather than just a single \nvalue.  We  now  show how  this approach bears on psychophys(cid:173)\nical and neurophysiological studies of population codes for mo(cid:173)\ntion direction in tasks involving transparent motion stimuli.  We \nshow that, unlike standard approaches, it is able to recover mul(cid:173)\ntiple motions from population responses, and also that its output \nis consistent with both correct and erroneous human performance \non psychophysical tasks. \n\nA  population code can be defined  as a  set of units  whose  activities  collectively \nencode some underlying variable (or variables).  The standard view is that popu(cid:173)\nlation codes are useful for accurately encoding the underlying variable when the \nindividual units are noisy.  Current statistical approaches to interpreting popula(cid:173)\ntion activity reflect this view, in that they determine the optimal single value that \nexplains the observed activity pattern given a particular model of the noise (and \npossibly a loss function). \nIn our work, we have pursued an alternative hypothesis, that the population en(cid:173)\ncodes  additional information  about  the  underlying  variable,  including  multiple \nvalues and uncertainty.  The Distributional Population Coding (DPC) framework \nfinds the best probability distribution across values that fits the population activity \n(Zemel, Dayan, & Pouget, 1998). \nThe DPC framework is appealing since it makes clear how extra information can \nbe conveyed in a population code.  In this paper, we use it to address a particu-\n\n\fDistributional Population  Codes and Multiple Motion Models \n\n100 \n\n50 \n\n0 \n\n~ \n~ \n] \n'0...  -180 \n~ \n~ 100 \n0 \n0... \n~ \n..... \n\n'50 \n\n0 \n-180 \n\n6.0: 30\u00b0 \n\n... \n..  \u2022 \n\u2022 \n..... \n\u2022\u2022 \n\n-90 \n\n{I \n\n90 \n\n180 \n\n6.0: 60\u00b0 \n\n.. +..\",. .... \n\n....... \n\n\u2022 \n\u2022\u2022 \n+ \n-90 \n\n0 \n\n90 \n\n180 \n\n100 \n\n'lO \n\n0 \n-180 \n\n100 \n\n'lO \n\n0 \n-180 \n\n175 \n\n6.0: 90\u00b0 \n\n...... \n\n0 \n\n-90 \n\nfir\" \n\".'\" \n\n....... \n\u2022 \n,.\" \n\u2022\u2022 .. \n.. \n.. \n. . .\"... \n... \n\n\u2022 \n\u2022 \n\n..++ \n\n-')0 \n\n0 \n\n90 \n\n180 \n\n90 \n\n180 \n\n6.0:  120\u00b0 \n\n... ..... ~ \n\nFigure 1:  Each of the four plots depicts a single MT cell response (spikes per sec(cid:173)\nond)  to  a  transparent motion stimulus of a  fixed  directional difference  (LlO)  be(cid:173)\ntween the two motion directions.  The x-axis gives the average direction of stim(cid:173)\nulus motion  relative to the cell's  preferred direction  (0\u00b0).  From Treue,  personal \ncommunication. \n\nlar body of experimental data on transparent motion perception, due to Treue and \ncolleagues (HoI &  Treue, 1997; Rauber &  Treue,  1997).  These transparent motion \nexperiments provide an ideal test of the DPC framework, in that the neurophysio(cid:173)\nlogical data reveal how the population responds to multiple values in the stimuli, \nand the psychophysical data describe how these values are actually decoded, pu(cid:173)\ntatively from the population response. We investigate how standard methods fare \non these data, and compare their performance to that of DPC. \n\n1  RESPONSES TO MULTIPLE MOTIONS \n\nMany  investigators  have  examined  neural  and  behavioral  responses  to  stimuli \ncomposed  of two patterns sliding across  each other.  These often create the im(cid:173)\npression of two separate surfaces moving in different directions. The general neu(cid:173)\nrophysiological finding is that an MT cell's response to these stimuli can be char(cid:173)\nacterized as the average of its responses to the individual components (van Wezel \net al., 1996; Recanzone et al., 1997).  As an example, Figure 1 shows data obtained \nfrom single-cell recordings in MT to random dot patterns consisting of two distinct \nmotion directions (Treue, personal communication). Each plot is for a different rel(cid:173)\native angle  (LlO)  between the two directions.  A plot can equivalently be viewed \nas the response of an population of MT cells having different preferred directions \nto a single presentation of a stimulus containing two directions. If LlO  is large, the \nactivity profile is bimodal, but as the directional difference shrinks, the profile be(cid:173)\ncomes unimodal. The population response to a LlO = 30\u00b0 motion stimulus is merely \na wider version of the response to a stimulus containing a single direction of mo(cid:173)\ntion.  However, this transition from a bimodal to unimodal profiles in MT does not \napparently correspond to subjects'  percepts;  subjects  can reliably  perceive both \nmotions in superimposed transparent random patterns down to an angle of 10\u00b0 \n(Mather &  Moulden, 1983).  If these MT activities playa determining role in mo(cid:173)\ntion perception, the challenge is to understand how the visual system can extract \n\n\f176 \n\nA \n\nR. S.  Zemel and P.  Dayan \n\nB \n\nr ~ , \n\n\" \n\nunit \n\n........... \n\nI ! \n\nI \n\nI \n\nencode \n\n__ \n\n................  decode \n\n_--------\nf \n\n.... \nt \n\n--\nI P[rIP(O)) 1 \n~  \"\"'\" \n: \n: \n: \n: \n, \nl  ~ , \"\n,.\" \n\\ \n, ' /  \n\\P(O)l~  ~'O \n\nJ(O)}===  \n\nI \n.\" )~ P(O)I \n\nt \n\nI \n\nI \n\nI \nI \n\n\\ \n\nunit \n\n(J \n\n+ \n\nP'(O)) \n\nP[P (O)lrJ \n\nI \n\nf \n\no \n\nFigure 2:  (A) The standard Bayesian population coding framework assumes that \na single value is encoded in a set of noisy neural activities.  (B) The distributional \npopulation coding framework shows how a  distribution over 8 can be encoded \nand then decoded from noisy population activities. From Zemel et al.  (1998). \n\nboth motions from such unimodal (and bimodal) response profiles. \n\n2  ENCODING & DECODING \n\nStatistical population code decoding methods begin with the knowledge, collected \nover many experimental trials,  of the tuning function  h(8) for each cell i, deter(cid:173)\nmined  using  simple stimuli  (e.g.,  ones  containing  uni-directional motion).  Fig(cid:173)\nure 2A cartoons the framework used for standard decoding.  Starting on the bot(cid:173)\ntom left, encoding consists of taking a value 8 to be coded and representing it by \nthe noisy activities ri of the elements of a population code.  In the simulations de(cid:173)\nscribed here, we have used a population of 200 model MT cells, with tuning func(cid:173)\ntions defined by random sampling within physiologically-determined ranges for \nthe parameters: baseline b,  amplitude a and width 0'.  The encoding model comes \nfrom the MT data: for a single motion, (ri /8)  = h(8) = bi +ai x exp[-(8-8i )2 /20'n \nwhile for two motions, (ri/81, ( 2 )  =  ~ [h(8d + h(82 )].  The noise is taken to be in(cid:173)\ndependent and Poisson. \nStandard Bayesian decoding starts with the activities r = {r i} and generates a dis(cid:173)\ntribution P[8/r]. Under the model with Poisson noise, \n\nThis  method  thus  provides  a  multiplicative kernel density estimate,  tending  to \nproduce a sharp distribution for  a single motion direction 8.  A single estimate 0 \ncan be extracted from P[8/r] using a loss function. \n\nFor this method to decode successfully when there are two motions in the input \n(81  and ( 2 ),  the extracted distribution must at least have two modes.  Standard \nBayesian decoding fails  to satisfy this  requirement.  First,  if the  response profile \nr  is  unimodal (d.  the 30\u00b0  plot in Figure I),  convolution with unimodal kernels \n{log h (8)}  produces a unimodal log P[8/r],  peaked about the average of the two \n\n\fDistributional Population Codes and Multiple Motion Models \n\n177 \n\ndirections.  The additive kernel density estimate, an alternative distributional de(cid:173)\ncoding method proposed by Anderson (1995), suffers from the same problem, and \nalso fails to be adequately sharp for single value inputs. \nSurprisingly,  the  standard  Bayesian decoding method  also fails  on bimodal re(cid:173)\nsponse  profiles.  If the baseline  response  bi  =  0,  then P[O/r]  is  Gaussian,  with \nmean L:i riOd L:il ri'  and variance  II L:i rdo-;  (Snippe, 1996; Zemel et aL,  1998). \nIf bi  > 0, then, for the extracted distribution to have two modes in the appropriate \npositions, log[P[01/r]/P[02Ir]]  must be smalL  However, the variance of this quan-\ntity is  L:i(ri) (log[/i(Odl h(02)])2, which is much greater than 0 unless the tuning \ncurves are so flat as to be able to convey only little information about the stimuli. \nIntuitively, the noise in the rates causes L: r i  log fi(O)  to be greater around one of \nthe two values, and exponentiating to form P[Olr] selects out this one value.  Thus \nthe standard method can only extract one of the two motion components from the \npopulation responses to transparent motion. \nThe distributional population coding method (Figure 2B) extends the standard en(cid:173)\ncoding model to allow r  to depend on general P[O]: \n\n(ri)  = l P [0] fi (O)dO \n\n(1) \n\nBayesian decoding takes the observed activities r and produces probability distri(cid:173)\nbutions over probability distributions over 0, P[P(O)/r]. For simplicity, we decode \nusing an approximate form of maximum likelihood in distributions over 0, finding \nthe pr(o) that maximizes L [P(O)lr]  '\" L:i r i  log [/i(O)  * P(O)]  - ag [P(O)]  where the \nsmoothness term g[]  acts as a regularizer. \nThe distributional encoding  operation in Equation 1 is quite straightforward - by \ndesign, since this represents an assumption about what neural processing prior to \n(in this case)  MT performs.  However,  the distributional decoding  operation that \nwe  have used  (Zemel  et aL,  1998)  involves  complicated and non-neural opera(cid:173)\ntions.  The idea is to understand what information in principle may be conveyed \nby a  population code under this interpretation,  and then to judge actual neural \noperations in the light of this theoretical optimum.  DPC is  a statistical cousin of \nso-called line-element models, which attempt to account for subjects' performance \nin cases  like  transparency using the  output of some fixed  number of direction(cid:173)\nselective mechanisms (Williams et al., 1991). \n\n3  DECODING MULTIPLE MOTIONS \n\nWe  have applied our model to simulated MT response patterns r  generated via \nthe DPC encoding model (Equation 1).  For multiple motion stimuli, with P(O)  = \n(8 (0 - 01 ) + 8 (0 - O2)) 12, this encoding model produces the observed neurophysio(cid:173)\nlogical response: each unit's expected activity is the av~rage of its responses to the \ncomponent motions.  For bimodal response patterns, DPC matches the generating \ndistribution (Figure 3).  For unimodal response patterns, such as those generated \nby double motion stimuli with fj.O  =  30\u00b0, DPC also consistently recovers the gen(cid:173)\nerating distribution.  The bimodality of  the  reconstructed  distribution begins  to \nbreak down around fj.O  =  10\u00b0, which is also the point at which subjects are unable \ndistinguish two motions from a single broader band of motion directions (Mather \n& Moulden, 1983). \nIt has been reported (Treue, personal communication)  that for  angles fj.0  <  10\u00b0, \nsubjects can tell that all points are not moving in parallel, but are uncertain whether \n\n\f178 \n\n200 \n\n~150 \n~ \n'5. \n$100 \n~ \n~ \nR \ne SO \n\n.: .. \n\n.. \n... dJ \n0\u00b0  \u2022 \n\n\u2022 \u2022   \u2022 \u2022  \n\n\u2022\u2022\u2022  eo \n\n. ~ \n\n\u2022 \n.0 0 \u00b0 \n\n\u2022 \n\nGO \u2022 \u2022 \u2022 \u2022  ' \n\n0\u00b0  : . - ........  ~ . . . . . . . .  . \n\n.....  ...  o\u00b7  '-000  \u2022\u2022\u2022  ~ \u2022 \n\u2022 , .   __  .,.\u00a5l \n\u00b7~_o \u2022 \u2022  ~o ..... , \n\n_  ... ..\" \n.4P\\  ~.. \n-90 \n90 \npreferred direction (deg) \n\n0 \n\n-~80 \n\n0.08 \n\n~0.06 \nCD \n\n<iC \n\u00a7:0.04 \n0... \n\n0.02 \n\n-Hi6  .. ::':120 \n\n-60 \n\n60 \ndirection (deg) \n\n0 \n\n180 \n\nI  \u2022 \u2022 \u2022 \u2022 \u2022 \u2022\n\n\u2022 \n\n, \n\n120 \n\n180 \n\nR. S.  Zemel and P.  Dayan \n\n200 \n\n~150 \nQ) \n.>< \n'5. \n'\" ~loo \n~c'\" \n8. \n~50 \n\no \n\n...  \u2022  It. \n\n. .  . \n,.  . o \n.. \n..... \n\u2022  0\"  dJ \n..~.  .. \n,1..  0 \n...  , . :   \u2022\u2022 \n\u2022  \u00b7:.tolft.~-\no. ,.\u00b7 ... 4-~ \n180 \n\n... \n\n\u2022 \n\n\u2022 \n\n0 \n\n.,..'\\,,;,~. \n, .   ..... \n\n-~80 \n\n-90 \n90 \npreferred direction (deg) \n\n0 \n\n0.08 \n\n~0 .06 \nCD \n<[' \neO.04 \n\n0... \n\n0.02 \n\n~ \ni \nI \n\n-60 \n\n60 \ndirection (deg) \n\n0 \n\n..\n\n.\u2022 \n120 \n\n.. , \n180 \n\nFigure  3:  (A)  On a  single  simulated  trial,  the  population  response  forms  a  bi(cid:173)\nmodal activity profile when 1:l8  =  120\u00b0.  (B) The reconstructed (darker) distribution \nclosely matches the true input distribution for this trial.  (C) As 1:l8  -+  10\u00b0, the pop(cid:173)\nulation response is no longer bimodal, instead has a noisy unimodal profile, and \n(D) the reconstructed distribution no longer has two clear modes. \n\nthey are moving in two discrete directions or within a directional band. Our model \nqualitatively  captures this  uncertainty,  reconstructing  a  broad  distribution  with \ntwo small peaks for directional differences between 7\u00b0 and 10\u00b0. \n\nDPC also matches psychophysical performance on metameric stimuli. Rauber and \nTreue (1997) asked human subjects to report the directions in moving dot patterns \nconsisting of 2,  3 or 5 directions of motion.  The motion directions were -40\u00b0  and \n+40\u00b0; -50\u00b0,  0\u00b0 and +50\u00b0;  and -50\u00b0,  -30\u00b0,  0\u00b0,  +30\u00b0,  and +50\u00b0,  respectively, but the \nproportions of dots moving in each direction were adjusted so that the population \nresponses produced by an encoding model similar to Equation 1 would all be the \nsame.  Subjects reported the same two motion directions,  at -40\u00b0  and 40\u00b0,  to all \nthree types of stimuli. \n\nDPC,  like  any reasonably deterministic decoding model,  takes  these (essentially \nidentical) patterns of activity and, metamerically, reports the same answer for each \ncase.  Unlike  most  models,  its  answer-that there  are  two  motions  at  roughly \n\u00b1400-matches human responses.  The fact  of metamerization is not due to any \nkind of prior in the model as  to the number of directions to be recovered.  How(cid:173)\never,  that the actual report in each case includes just two motions  (when clearly \nthree or five motions would be equally consistent with the input) is a consequence \nof the smoothness  prior.  We  can go further with DPC  and  predict how chang(cid:173)\ning the proportion of dots moving in the central of three directions would lead to \ndifferent percepts - from a single motion to two as this proportion decreases. \n\nWe  can further evaluate the performance of DPC by comparing the quality of its \n\n\fDistributional Population Codes and Multiple Motion Models \n\n179 \n\n100 \n\n-~ ...  75 \n\ng \n\nQ) \nQ) \n\n.~ as  50 \nCD ... \n\nQ) \n0) \n~ 25 \nQ) \n~ \n\n00 \n\n10 \n\n20 \n\n30 \n\n.1.9  (deg) \n\n40 \n\n50 \n\n60 \n\nFigure 4:  The average relative error E  in direction judgments (Equation 2) for the \nDPC model (top curve) and for a model with the correct prior for this particular \ninput set. \n\nreconstruction to that obtained by fitting  the correct model of the input distribu(cid:173)\ntion,  a mixture of delta functions.  We  simulated MT responses to motion stimuli \ncomposed of two evenly-weighted directions, with 100 examples for each value of \n~() in a range from 5\u00b0  to 60\u00b0. We fit a mixture of two delta functions to each pop(cid:173)\nulation response, and measured the average relative error in direction judgments \nbased on this  fitted  distribution versus the two true directions,  ()1  and ()2  on that \nexample t: \n\nWe  then applied the DPC model to the same population codes.  To  measure the \naverage error, we first fit the general distribution pr\u00ab())  produced by DPC with a \npair of equal-weighted Gaussians, and determined O~ and O~ from the appropriate \nmean and variance.  As  can be seen in Figure 4, the DPC model, which only has \na general smoothness prior over the form of the input distribution, preserves the \ninformation in the observed rates nearly as well as the model with the correct prior. \n\n(2) \n\n4  CONCLUSIONS \n\nTransparent motion  provides  an  ideal  test  of distributional  population  coding, \nsince the encoding model is determined by neural activity and the decoding model \nby the behavioral data. Two existing kernel density estimate models, involving ad(cid:173)\nditive (Anderson, 1995) and multiplicative (standard Bayesian decoding) combina(cid:173)\ntion, perform poorly in this paradigm. DPC, a model in which neuronal responses \nand the animal's judgments are treated as being sensitive to the  entire distribu(cid:173)\ntion of an encoded value, has been shown to be consistent with both single-cell \nresponses and behavioral decisions, even matching subjects' threshold behavior. \nWe  are currently applying this same model to several other motion experiments, \nincluding one in which subjects had to determine whether a motion stimulus con(cid:173)\nsisted  of a  number of discrete directions  or  a  uniform  distribution  (Williams  et \nal., 1991). We are investigating whether our model can explain the nonmonotonic \nrelationship between the number of directions and the judgments.  We  have also \napplied DPC to a notorious puzzle for population coding:  that single MT cells are \n\n\f180 \n\nR.  S.  Zemel and P  Dayan \n\njust as  accurate as  the whole monkey - one cell's output could directly support \ninference of the same quality as the monkeys.  Our approach provides an alterna(cid:173)\ntive explanation for part of this apparent inefficiency to that of the noisy pooling \nmodel of Shadlen et al.  (1996).  Finally, experiments showing the effect of target \nuncertainty on population responses (Basso & Wurtz, 1998; Bastian et al,. 1998) are \nalso handled naturally by the DPe approach. \nThe current model is intended to describe the information available at one stage \nin the processing stream.  It does not address the precise mechanism of motion \nencoding, i.e., how responses in MT arise. We also have not considered the neural \ndecoding and decision mechanisms. These could likely involve a layer of units that \nreaches decisions through a pattern of feedforward and lateral connections, as in \nthe model proposed by Grunewald (1996) for the detection of transparent motion. \nOne critical issue that remains is normalization.  It is not clear how to distinguish \nambiguity about a single value for the encoded variable from the existence of mul(cid:173)\ntiple  values  of that variable  (as in transparency for motion).  Various  factors  are \nlikely to be important, including the degree of separation of the modes and also \nprior expectations about the possibility of equivalents of transparency. \n\nAcknowledgements: This work was funded by ONR Young Investigator Award NOOOI4-98-1-0509  to RZ, and NIMH \ngrant lR29MH5541-01,  and grants from  the  Surdna Foundation and the Gatsby Charitable Foundation to PD.  We \nthank Stefan Treue for proViding us with the data plot and for informative discussions of his experiments; Alexan(cid:173)\ndre Pouget and Charlie Anderson for useful discussions of distributed coding and the standard model; and Zoubin \nGhahramani and Geoff Hinton for helpful conversations about reconstruction in the log probability domain. \n\nReferences \n[1]  Anderson, C. H. (1995).  Unifying perspectives on neuronal codes and processing.  In XIX International workshop \n\non  condensed matter theories. Caracas, Venezuela. \n\n[2]  Basso, M.  A.  &  Wurtz, R. H. (1998).  Modulation of neuronal activity in superior colliculus by changes in target \n\nprobability. Journal a/Neuroscience, 18(18),7519-34. \n\n[3]  Bastian, A., Riehle, A., Erlhagen, w., & Schoner, G.  (1998). Prior information preshapes the population represen(cid:173)\n\ntation of movement direction in motor cortex.  Neuroreport, 9(2), 315-319. \n\n[4]  Britten, K.  H., Shadlen, M. N .,  Newsome, W.  T.,  &  Movshon, J. A.  (1992).  The analysis of visual motion:  A \n\ncomparison of neuronal and psychophysical performance. Journal a/Neuroscience, 12(12), 4745-4765. \n\n[5]  Grunewald, A.  (1996).  A model of transparent motion and non-transparent motion aftereffects.  In D. S. Touret(cid:173)\n\nzky,  M. C.  Mozer, &  M. E.  Hasselmo (Eds.), Advances  in  Neural  Information  Processing Systems  8 (pp. 837-843). \nCambridge, MA: MIT Press. \n\n[6]  HoI, K. &  Treue, S.  (1997).  Direction-selective responses in the superior temporal sulcus to transparent patterns \n\nmoving at acute angles.  Society for Neuroscience Abstracts 23  (p. 179:11). \n\n[7]  Mather, G.  &  Moulden, B.  (1983).  Thresholds for  movement direction:  two directions are less detectable than \n\none.  Quarterly Journal 0/ Experimental Psychology, 35, 513-518. \n\n[8]  Rauber, H . J.  &  Treue, S. (1997).  Recovering the directions of visual motion in transparent patterns.  Society for \n\nNeuroscience Abstracts 23  (p. 179:10). \n\n[9]  Recanzone, G. H.,  Wurtz, R.  H.,  &  Schwarz, U.  (1997).  Responses of MT  and MST  neurons to one and  two \n\nmoving objects in the receptive field. Journal a/Neurophysiology, 78(6), 2904-2915. \n\n[10]  Shadlen,  M.  N .,  Britten, K.  H, Newsome,  W.  T.,  &  Movshon, J. A. (1996).  A  computational analysis  of the \nrelationship between neuronal and behavioral responses to visual motion.  Journal 0/ Neuroscience, 16(4), 1486--\n510. \n\n[11]  Snippe, H. P. (1996).  Theoretical considerations for  the analysis of population coding in motor cortex.  Neural \n\nComputation, 8(3):29-37. \n\n[12]  van Wezel, R.  J.,  Lankheet, M. J., Verstraten, F.  A.,  Maree,  A.  F.,  &  van de Grind, W.  A.  (1996).  Responses of \n\ncomplex cells in area 17 of the cat to bi-vectorial transparent motion. Vision  Research, 36(18), 2805-13. \n\n[13]  Williams, D., Tweten,S., &  Sekuler, R.  (1991).  Using me tamers to explore motion perception.  Vision  Research, \n\n31(2),275-286. \n\n[14]  Zemel, R. 5., Dayan, P , &  Pouget, A. (1998). Probabilistic interpretation of population codes. Neural Computation, \n\n10,403-430. \n\n\fPART III \nTHEORY \n\n\f\f", "award": [], "sourceid": 1556, "authors": [{"given_name": "Richard", "family_name": "Zemel", "institution": null}, {"given_name": "Peter", "family_name": "Dayan", "institution": null}]}