{"title": "An Application of the Principle of Maximum Information Preservation to Linear Systems", "book": "Advances in Neural Information Processing Systems", "page_first": 186, "page_last": 194, "abstract": null, "full_text": "186 \n\nAN APPLICATION OF THE PRINCIPLE OF \nMAXIMUM INFORMATION PRESERVATION \n\nTO LINEAR SYSTEMS \n\nIBM T. J.  Watson Research Center, Yorktown Heights, NY 10598 \n\nRalph Linsker \n\nABSTRACT \n\nThis paper addresses the problem of determining the weights for a \nset  of  linear  filters  (model  \"cells\")  so  as  to  maximize  the \nensemble-averaged information that the cells' output values jointly \nconvey about their input values,  given  the  statistical properties of \nthe ensemble of input vectors.  The quantity that is maximized is the \nShannon  information  rate,  or  equivalently  the  average  mutual \ninformation between input and output.  Several models for the role \nof processing noise are analyzed, and the biological motivation for \nconsidering them is described.  For simple models in which nearby \ninput  signal  values  (in  space  or  time)  are  correlated,  the  cells \nresulting  from  this  optimization  process  include  center-surround \ncells and cells sensitive to temporal variations in input signal. \n\nINTRODUCTION \n\nI  have  previously  proposed  [Linsker,  1987,  1988]  a  principle  of  \"maximum \ninformation preservation,\" also called the \"infomax\" principle, that may account for \ncertain aspects of the organization of a  layered perceptual network.  The principle \napplies  to a  layer L  of cells  (which may be the input layer or an intermediate layer \nof the  network)  that provides  input  to  a  next layer M.  The mapping of the input \nsignal  vector  L  onto  an  output  signal  vector  M,  f:L  ~ M,  is  characterized  by  a \nconditional  probability  density  function  (\"pdf\")  p(MI L).  The  set  S  of  allowed \nmappings I  is specified.  The input pdf PL(L) is also given.  (In the cases considered \nhere,  there  is  no  feedback  from  M  to  L.)  The  infomax  principle  states  that  a \nmapping I  should  be  chosen  for  which  the  Shannon  information  rate  [Shannon, \n1949] \n\nR(j) ==  f dL PL(L) f dM p(MI L) 10g[P(MI L)/PM(M)] \nis a  maximum (over allIin the set S).  Here PM(M)  ==  fdLPL(L)P(MIL) is  the pdf \nof  the  output  signal  vector  M.  R  is  identical  to  the  average  mutual  information \nbetween Land M. \n\n(1) \n\n\fMaximum Infonnation Preservation to Linear Systems \n\n187 \n\nTo understand better how the info max principle may be applied to biological systems \nand  complex  synthetic  networks,  it  is  useful  to  solve  the  infomax  optimization \nproblem explicitly for simpler systems whose properties are nonetheless biologically \nmotivated.  This  paper therefore  deals  with  the  practical  computation of  infomax \nsolutions for cases in which the mappings! are constrained to be linear. \n\nINFOMAX SOLUTIONS FOR A SET OF LINEAR FILTERS \n\nWe  consider the  case  of linear model  \"neurons\"  with  multivariate  Gaussian  input \nand additive Gaussian noise.  There are  N  input  (L)  cells  and N'  output  (M)  cells. \nThe  input  column  vector  L  = (Lt,~, ... ,LNF  is  randomly  selected  from  an \nN-dimensional Gaussian distribution having mean zero.  That is, \n\n(2) \n\nwhere  QL  is  the  covariance  matrix  of  the  input  activities,  Q6 = J dL PL(L)LjLj \n(Superscript T denotes the matrix transpose.) \n\n\u2022 \n\nTo specify the  set  S of allowed  mappings !:L ....  M, we  define  a  processing  model \nthat  includes  a  description  of  (i)  how  noise  enters  during  processing,  (ii)  the \nindependent variables  over which  we  are  to  maximize  R,  and  (iii)  any  constraints \non their values.  Figure  1 shows several such models.  We shall analyze the simplest, \nthen explain the motivation for the more complex models and analyze them in turn. \n\nModel A -- Additive noise of constant variance \n\nIn Model A  of Fig.  1 the output signal value of the nth M cell is: \n\n(3) \n\nThe  noise  components  \"11  are  independently  and  identically  distributed  (fli.i.d. \") \nrandom  variables  drawn  from  a  Gaussian  distribution  having  a  mean  of zero  and \nvariance B. \n\nEach  mapping !:L ....  M  is  characterized  by  the  values  of  the  {Cnj }  and  the  noise \nparameter  B.  The  elements  of  the  covariance  matrix  of  the  output  activities  are \n(using Eqn. 3) \n\n(4) \n\nwhere ~nm = 1 if n  =  m  and 0 otherwise. \nEvaluating Eqn.  1 for this processing model gives the information rate: \n\nR(j) =  (1/2)  In Det W(j) \n(5) \nwhere  ~m = Q:!'/ B.  (R  is  the  difference  of  two  entropy  terms.  See  [Shannon, \n1949], p.57, for the entropy of a Gaussian distribution.) \n\n\f188 \n\nLinsker \n\nIf the  components  Cni  of the  C  matrix are  allowed  to be arbitrarily large,  then the \ninformation  rate  can  be  made  arbitrarily  large,  and  the  effects  of  noise  become \narbitrarily small.  One way to limit  C is  to impose a  \"resource constraint\"  on each \nM cell.  An example of such a constraint is ~jqj =  1 for all n.  One can then attempt \ndirectly,  using numerical methods, to maximize Eqn.  5 over all  allowed  C for given \nB.  However,  when  some  additional  conditions  (below)  are  satisfied,  further \nanalytical progress can be made. \n\nSuppose the NL-cells are uniformly spaced along the line interval [0,1] with periodic \nboundary conditions, so that cell N  is  next to cell  1.  [The analysis can be extended \nto a two- (or higher-) dimensional array in  a straightforward manner.] Suppose also \nthat (for given N) the covariance Q6 of the input values at cells  i and j  is  a function \nQL(Sj)  only  of  the  displacement  s'J  from  i  to j.  (We  deal  with  the  periodicity  by \ndefining \nthat \n-N/2 S Sab  < N/2.) Then  QL  is  a Toeplitz matrix,  and its  eigenvalues  {Ak}  are the \ncomponents of the discrete Fourier transform (\"F.T.\") of QL(S): \n\nSab  = b - a  - Ya~  and \n\ninteger \n\nchoosing \n\nthe \n\nYab \n\nsuch \n\nAk  =  ~sQL(s) exp( -2~ks/N),  (-N/2) S  k < N/2. \n\n(6) \n\n(1)  N'  = N.  This  simplifies  the  resulting \nWe  now  impose  two  more  conditions: \nexpressions, but is otherwise inessential, as we shall discuss.  (2) We constrain each \nM  cell  to have  the same  arrangement of  C-values relative  to  the M  cell's  position. \nThat is,  Cnj  is  to be a  function  C(Sni)  only of the displacement Sni  from  n  to  i.  This \nconstraint substantially reduces the computational demands.  We  would not expect \n\nL\u00b7 I \n\n(S,C) \n\nL\u00b7 I \n\n(D) \n\nFigure  1. \n\nFour processing models  (A)-(D):  Each diagram shows a single \nM cell  (indexed by n) having output activity Mn.  Inputs {LJ may \nbe  common  to  many  M  cells.  All  noise  contributions  (dotted \nlines)  are  uncorrelated  with  one  another and  with  {LJ.  GC  = \ngain control (see text). \n\n\fMaximum Information Preservation to Linear Systems \n\n189 \n\nit to hold in general in a biologically realistic model -- since different M cells should \nbe  allowed  to  develop  different  arrangements  of  weights  -- although  even  then  it \ncould  be  used  as  an  Ansatz  to  provide  a  lower  bound  on  R.  The  section, \n\"Temporally-correlated  input  patterns,\"  deals  with  a  situation  in  which  it  is \nbiologically plausible to impose this constraint. \n\nfor \n\nthe  eigenvalues \n\nis  obtained  by \n\nUnder  these  conditions,  (Q:!')  is  also  a  Toeplitz  matrix. \nIts  eigenvalues  are  the \ncomponents of the F.T.  of  QM(snm).  For N'  =  N  these eigenvalues are  (B + A~k) , \nwhere  Zk  = ICkl2  and  Ck ==  ~sC(s) exp( -2'TT~ks/N) is  the  F.T.  of  C(s).  [This \nexpression \nrewriting  Eqn.  4  as: \nQM(snm)  =  B8n_m.o + ~j.jC(snJQL(Sj)C(sm)  ,and  taking  the  F.T.  of  both  sides.] \nTherefore \nR  =  (1/2)~k In[l  + AJcZk/ B]. \nWe  want  to  maximize  R  subject  to  ~sC(S)2 = 1,  which  is  equivalent  to  ~Zk = N  . \nUsing  the  Lagrange  multiplier  method,  we  maximize  A  ==  R  + 11-(~Zk - N)  over  all \nnonnegative  {Zk}'  Solving for  (JA/ (JZk  =  0  and requiring Zk  ~ 0  for  all  k  gives  the \nsolution: \nZk  = max[(  -1/211-)  -\n\n(B/Ak)' 0], \n\n(7) \n\n(8) \n\nwhere  (given B) 11- is chosen such that ~Zk =  N. \n\nNote that while the optimal  {Zk}  are uniquely determined, the phases of the  {ck} are \ncompletely arbitrary  [except that since the  {C(s)}  are real,  we  must have  Ck * = c_ k \nfor all k].  The {C(s)}  values are therefore not uniquely determined.  Fig.  2a shows \ntwo  of the  solutions  for .an example  in  which  QL(S)  = exp[  - (s/ So)2]  with  So  =  6, \nN=N'=64,  and  B.:..:.l.  Both  solutions  have  ZO.\u00b11 ..... \u00b16=5.417,  5.409,  5.378, \n5.306, 5.134,4.689,3.376, and all  other Zk  ==  O.  Setting all  Ck  phases to zero yields \nthe solid  curve;  a particular random choice of phases yields the dotted dHve.  We \nshall  later  see  that  imposing  locality  conditions  on  the  {C(s)}  (e.g.,  penalizing \nnonzero C(s) for large  I s I) can remove the phase ambiguity. \nOur  solution  (Eqn.  8)  can  be  described  in  terms  of  a  so-called  \"water-filling\" \nanalogy:  If one plots B /Ak versus k, then Zk  is  the depth of \"water\" at k  when one \n\"pours\" into the  \"vessel\" defined by the B / Ak curve a total quantity of \"water\" that \ncorresponds to ~Zk = N and brings the \"water level\" to (  -1/211-). \n\nLet us  contrast this  problem with  two  other problems to  which  the  \"water-filling\" \nanalogy has been applied in the information-theory literature.  In our notation, they \nare: \n1.  Given a transfer function  {C(s)}  and the noise variance B, how should a given \ntotal input signal  power ~Ak be  apportioned among the  various wavenumbers \nk  so as to maximize  the information rate R  [Gallager,  1968]?  Our problem is \ncomplementary to this:  we  fix  the input signal  properties and seek an optimal \ntransfer function subject to constraints. \n\n\f190 \n\nLinsker \n\n2.  Rate-distortion (R-D) calculation [Berger,  1971]:  Given a distortion measure \n(that defines a \"distance\" between the actual input signal and an estimate of it \nthat  can  be  reconstructed  from  the  channel's  output),  and  the  input  power \nspectrum  p.k},  what choice of  {Zk}  minimizes  the average distortion for  given \ninformation rate  (or minimizes  the required  rate for  given distortion)?  In the \nR-D  problem  there  is  a  process  of  reconstruction,  and  a  given  measure  for \nassessing the  \"goodness\" of reconstruction.  In contrast, in our network there \nis  no reconstruction of the input signal,  and no criterion of the  \"goodness\"  of \nsuch a hypothetical reconstruction is provided. \n\nNote  also  that infomax  optimization is  not  the  same  as  computing  which  channel \n(that is,  which  mapping !:L ....  M) selected from  an allowed  set  has  the  maximum \ninformation-theoretic  capacity.  In  that  problem,  one  is  free  to  encode  the  inputs \nbefore transmission so as to make optimal use of (Le.,  \"achieve the capacity of\") the \nchannel.  In our case, there is no such pre-encoding; the input ensemble is prescribed \n(by the environment or by the output of an earlier processing stage) and we need to \nmaximize the channel rate for that ensemble. \nThe simplifying condition  that N = N'  (above)  is  unnecessarily restrictive.  Eqn.  7 \ncan be easily generalized to the case in which N is a mUltiple of N' and the N' M cells \nare uniformly spaced on the unit interval.  Moreover, in the limit that 1/ N' is  much \nsmaller than the correlation length scale of QL,  it can be shown that R is unchanged \nwhen we  simultaneously increase N'  and B  by the same factor.  (For example,  two \nadjacent M cells each having noise variance 2B jointly convey the same information \n\nc \n\n(0) \n\nc \n\n(b) \n\nc \n\nl \n..-.' \n\ns \n\n-10 \n\n\\  ,: \n\\,/ \n\nFigure  2. \n\ninfomax \n\nfor \n\nsolutions  C(s) \n\nlocally-correlated \nExample \n(a) Model A;  region of nonnegligible C(s) extends over \ninputs: \nall s;  phase ambiguity in  Ck  yields  non unique  C(s) solutions,  two \nof which  are  shown.  See  text for  details.  (b)  Models  C  (solid \ncurve) and D (dotted curve)  with Gaussian g(S)-l favoring short \nconnections;  shows  center-surround  receptive  fields,  more \npronounced  in  Model  D. \n(c)  \"Temporal receptive  field\"  using \nModel D for temporally correlated scalar input to a single M cell; \nC(s) is the weight applied to the input signal that occurred s time \nsteps ago.  Spacing between ordinate marks is 0.1;  ~ C(S)2  =  1 in \neach case. \n\n\fMaximum Information Preservation to Linear Systems \n\n191 \n\nabout L as one M cell having noise variance B.)  For biological applications we  are \nmainly  interested  in  cases  in  which  there  are  many  L  cells  [so  that  C(s)  can  be \ntreated as  a function of a continuous variable]  and many M cells  (so that the effect \nof the noise process is  described by the single parameter B/ N). \n\nThe  analysis  so  far  shows  two  limitations  of  Model  A.  First,  the  constraint \n~iqi = 1 is quite arbitrary.  (It certainly does not appear to be a biologically natural \nconstraint  to  impose!)  Second,  for  biological  applications  we  are  interested  in \npredicting the favored  values of  {C(s)},  but the  phase  ambiguity prevents this.  In \nthe  next  section  we  show  that  a  modified  noise  model  leads  naturally,  without \narbitrary constraints on ~iqi' to the same results derived above.  We  then turn to a \nmodel  that  favors  local  connections  over  long-range  ones,  and  that  resolves  the \nphase ambiguity issue. \n\nModel B -- Independent noise on each input line \n\nIn Model B of Fig.  1 each input Li  to the  nth M  cell is  corrupted by Li.d.  Gaussian \nnoise  Vl1i  of mean zero and variance B.  The output is \n\n(9) \n\nSince each Vni is independent of all other noise terms (and of the inputs {Li }), we find \n\n(10) \n\nWe  may  rewrite  the  last term  as  B~l1m (~iqy!2 (~jc;)l/2. The information rate  is \nthen R  =  (1/2) In DetWwhere \n\n(11) \n\nDefine c' ni  ==  Cl1i(~kqk)-1/2 ; then  J\u00a5,.m  =  ~lIm + (~,.jc'lIiQbC' mj)/ B. Note that this is \nidentical (except for the replacement C ~ C')  to the expression following Eqn.  (5), \nin  which  QM  was  given  by Eqn.  (4).  By definition,  the  {C' nil  satisfy ~iC';i =  1 for \nall  n.  Therefore, the  problem of maximizing R  for this model  (with no  constraints \non ~jq;) is identical to the problem we  solved in the previous section. \n\nModel C -- Favoring of local connections \n\nSince the arborizations of biological cells tend to be spatially localized in many cases, \nwe  are  led  to consider constraints or cost terms  that favor localization.  There are \nvarious  ways  to  implement  this.  Here  we  present  a  way  of  modifying  the  noise \nprocess  so  that  the  infomax  principle  itself  favors  localized  solutions,  without \nrequiring additional terms unrelated to information transmission. \n\nModel C of Fig.  1 is the same as Model B,  except that now  the longer connections \nare  \"noisier\"  than the shorter ones.  That is,  the  variance of VIIi  is  <V;i>  =  B~(sn;) \nwhere g(s) increases with  1 s I.  [Equivalently, one could attenuate the signal on the \n(i ~ n) line by g(sll;) 1/2 and have the same noise variance Bo on all lines.] \n\n\f192 \n\nLinsker \n\nThis  change  causes  the  last  term  of  Eqn.  10  to  be  replaced  by  Bo8I1m~g(SIl)qi . \nUnder the conditions discussed earlier (Toeplitz QL and QM, and N  = N), we derive \n\n(12) \n\nRecall that the {ck }  are related to {C(s)} by a Fourier transform (see just before Eqn. \n7).  To cotppute which choice of IC(s)} maximizes R  for a given problem, we  used \na gradient ascent algorithm several times,  each time using a different random set of \ninitial I C(s)} values.  For the problems whose solutions are exhibited in Figs. 2b and \n2c,  multiple  starting  points  usually  yielded  the  same  solution  to  within  the  error \ntolerance specified for the algorithm [apart from an arbitrary factor by which all of \nthe  C(s)'s  can be multiplied without affecting R], and that solution had the largest \nR of any obtained for the given problem.  That is, a limitation sometimes associated \nwith gradient ascent algorithms -- namely,  that they may yield multiple  \"solutions\" \nthat are local, but far from global, maxima -- did not appear to be a difficulty in these \ncases. \n\nFig.  2b  (solid  curve)  shows  the  infomax  solution  for  an  example  having \n(S/sO)2]  and  g(s) = exp[(s/s.)2]  with  So  = 4,  s.  = 6,  N = N  = 32, \nQL(S)  =  exp[  -\nand  Bo  =  0.1.  There  is  a  central  excitatory  peak  flanked  by  shallow  inhibitory \n(As  noted,  the  negative  of  this \nsidelobes  (and  weaker  additional  oscillations). \nsolution, having a central inhibitory region and excitatory sidelobes, gives the same \nR.)  As Bo  is  increased (a range  from  0.001  to 20 was studied), the peak broadens, \nthe  sidelobes  become  shallower  (relative  to  the  peak),  and  the  receptive  fields  of \nnearby  M  cells  increasingly  overlap.  This  behavior  is  an  example  of  the \n\"redundancy-diversity\" tradeoff discussed in [Linsker, 1988]. \n\nModel D  -- Bounded output variance \n\nOur previous  models  all  produce output values  Mn  whose  variance is  not explicitly \nconstrained.  More  biologically  realistic  cells  have  limited  output  variance.  For \nexample, a cell's firing rate must lie  between zero and some maximum value.  Thus, \nthe  output  of  a  model  nonlinear  cell  is  often  taken  to  be  a  sigmoid  function  of \n(~iCII;L)\u00b7 \n\nWithin  the  context  of linear  cell  models,  we  can  capture  the  effect  of a  bounded \noutput  variance  by  using  Model  D  of  Fig.  1.  We  pass  the  intermediate  output \n~iClIi(Li + VIIi)  through  a  gain  control  QC  that  normalizes  the  output  variance  to \nunity, then we  add a final  (Li.d. Gaussian) noise term V'II  of variance R..  That is, \n\n(13) \n\nWithout the last term,  this  model  wo~ld be identical to Model C, since  mUltiplying \nboth the signal and the VIIi  noise by the same factor GC would not affect R.  The last \nterm in effect fixes  the number of output values that can be discriminated (Le.,  not \nconfounded with each other by the noise process V'II)  to be of order Rl1!2. \n\nThe information rate for this model is derived to be (cf. Eqn.  12): \n\n\fMaximum Information Preservation to Linear Systems \n\n193 \n\n(14) \n\nwhere  V( C)  is  the  variance  of the  intermediate  output before  it is  passed through \nGC: \n\n(15) \n\nFig.  2b (dotted curve)  shows the infomax solution (numerically obtained as above) \nfor the same QL(S) and g(s) functions and parameter values as were used to generate \nthe solid curve  (for Model C), but with the new parameter Bl  =  0.4.  The effect of \nthe new Bl  noise process in  this  case is  to deepen the inhibitory sidelobes  (relative \nto  the  central  peak).  The  more  pronounced  center-surround  character  of  the \nresulting M  cell dampens the response of the cell to differences  (between different \ninput  patterns)  in  the  spatially  uniform  component  of  the  input  pattern.  This \nresponse  property  allows  the  L  ....  M  mapping  to  be  info max-optimal  when  the \ndynamic range of the cells' output response is constrained.\u00b7  (A competing effect can \ncomplicate  the  analysis:  If Bl is  increased much  further,  for  example  to 50 in  the \ncase discussed, the sidelobes move to larger s and become shallower.  This behavior \nresembles that discussed at the end of the previous section for the case of increasing \nBo;  in  the  present case it is  the overall noise  level  that is  being increased  when  Bl \nincreases and Bo is kept constant.) \n\nTemporaUy-correlated input patterns \n\nLet us  see how infomax can  be used  to  extract regularities  in input time  series,  as \ncontrasted with the spatially-correlated input patterns discussed above.  We consider \na single M cell that, at each discrete time denoted by n, can process inputs {LJ from \nearlier  times  i  ~ n  (via  delay  !ines,  for  example).  We  use  the  same  Model  D  as \nbefore.  There are two differences:  First, we want g(s)  =  00  for all s > 0 (input lines \nfrom  future  times  are  \"infinitely noisy\").  [A  technical  point:  Our use  of periodic \nboundary conditions, while  computationally convenient, means that the input value \nthat will  occur s time steps from now is  the same value that occurred (N - s)  steps \nago.  We  deal  with  this  by  choosing  g(s)  to  equal  1  at  s =  0,  to  increase  as \ns ....  -N/2 (going into the past), and to increase further as s decreases from  +N/2 \nto  1,  corresponding  to  increasingly  remote  past  times.  The  periodicity  causes  no \nunphysical effects, provided that we  make g(s)  increase rapidly enough (or make N \nlarge enough) so that C(s) is negligible for time intervals comparable to N.]  Second, \nthe  fact  that  C,,;  is  a  function  only of s'\"  is  now a  consequence of the constancy of \nconnection weights C(s) of a single M cell with time, rather than merely a convenient \nAnsatz to facilitate the infomax computation for a set of many M cells  (as it was in \nprevious sections). \n\nThe \ninfomax \nfor  an  example  having \nt(s} = s \nQL(S)  = exp[ -\nfor  s  ~ 0  and \nt(s} = s - N  for s  ~ 1;  So  = 4, Sl  = 6, N  =  32, Bo  = 0.1, and Bl = 0.4. The result is \nthat the  \"temporal receptive  field\"  of the M  cell is  excitatory for  recent times,  and \n\nsolution \nin  Fig.  2c \n(S/So)2];  g(s) = exp[  -t(s}/s.J  with \n\nis \n\nshown \n\n\f194 \n\nLinsker \n\ninhibitory  for  somewhat  more  remote  times  (with  additional  weaker  oscillations). \nThe cell's output can be viewed approximately as a linear combination of a smoothed \ninput  and  a  smoothed  first  time  derivative  of  the  input,  just  as  the  output  of  the \ncenter-surround cell of Fig.  2b can be viewed as a linear combination of a smoothed \ninput and a  smoothed second spatial derivative of the input.  As  in Fig.  2b, setting \nBI  =  0 (not shown) lessens the relative inhibitory contribution. \n\nSUMMARY \n\nTo  gain  insight  into  the  operation  of  the  principle  of  maximum  information \npreservation, we  have applied the principle to the problem of the optimal design of \nan array of linear filters  under various conditions.  The filter models that have been \nused are motivated by certain features that appear to be characteristic of biological \nnetworks.  These  features  include  the  favoring  of  short  connections  and  the \nconstrained  range of output signal values.  When nearby input signals  (in space or \ntime) are correlated, the infomax-optimal solutions for the cases studied include (1) \ncenter-surround  cells  and  (2)  cells  sensitive  to  temporal  variations  in  input.  The \nresults  of  the  mathematical  analysis  presented  here  apply  also  to  arbitrary  input \ncovariance functions of the form QL( I i  -\nj I).  We have also presented more general \nexpressions for the information rate, which can be used even when  QL is not of this \nform.  The cases discussed illustrate the operation of the infomax principle in some \nrelatively simple but instructive situations.  The analysis and results suggest how the \nprinciple may be applied to more biologically realistic networks and input ensembles. \n\nReferences \n\nT.  Berger,  Rate  Distortion  Theory  (Prentice-Hall,  Englewood  Cliffs,  N.J.,  1971), \n\nchap.  4. \n\nR.  G.  Gallager,  Information  Theory  and Reliable  Communication  (John Wiley  and \n\nSons, N.Y., 1968), p.  388. \n\nR.  Linsker,  in:  Neural  Information  Processing  Systems  (Denver,  Nov.  1987),  ed. \n\nD.  Z.  Anderson (Amer. Inst. of Physics, N.Y.), pp. 485-494. \n\nR. Linsker,  Computer 21  (3)  105-117  (March 1988). \n\nC.  E. Shannon and W.  Weaver,  The Mathematical Theory of Communication  (Univ. \n\nof Illinois Press, Urbana, 1949). \n\n\f", "award": [], "sourceid": 102, "authors": [{"given_name": "Ralph", "family_name": "Linsker", "institution": null}]}