{"title": "Maximum-Likelihood Continuity Mapping (MALCOM): An Alternative to HMMs", "book": "Advances in Neural Information Processing Systems", "page_first": 744, "page_last": 750, "abstract": null, "full_text": "Maximum-Likelihood Continuity Mapping \n\n(MALCOM): An Alternative to HMMs \n\nDavid A. Nix \n\ndnix@lanl.gov \n\nJohn E. Hogden \n\nhogden@lanl.gov \n\nComputer Research &  Applications \n\nComputer Research &  Applications \n\nCIC-3, MS B265 \n\nCIC-3, MS B265 \n\nLos Alamos National Laboratory \n\nLos Alamos National Laboratory \n\nLos Alamos, NM 87545 \n\nLos Alamos, NM 87545 \n\nAbstract \n\nWe describe Maximum-Likelihood Continuity Mapping (MALCOM), an \nalternative to  hidden Markov models (HMMs) for processing sequence \ndata such as speech.  While HMMs have a discrete \"hidden\" space con(cid:173)\nstrained by  a fixed  finite-automaton architecture, MALCOM has a con(cid:173)\ntinuous hidden space-a continuity map-that is  constrained only  by  a \nsmoothness requirement on paths through the space.  MALCOM fits  into \nthe same probabilistic framework for speech recognition as  HMMs, but \nit represents  a  more  realistic  model  of the  speech  production  process. \nTo  evaluate the extent to which MALCOM captures speech production \ninformation, we  generated continuous speech continuity maps  for three \nspeakers  and  used  the  paths  through  them  to  predict measured  speech \narticulator data.  The median  correlation  between  the MALCOM paths \nobtained from  only the  speech  acoustics  and  articulator measurements \nwas 0.77 on an  independent test set not used  to train MALCOM or the \npredictor.  This  unsupervised  model  achieved  correlations  over speak(cid:173)\ners and articulators only 0.02 to 0.15 lower than those obtained using an \nanalogous supervised method which used articulatory measurements as \nwell as acoustics .. \n\n1 \n\nINTRODUCTION \n\nHidden Markov models (HMMs) are generally considered to be the state of the art in speech \nrecognition (e.g., Young, 1996). The strengths of the HMM framework include a rich math(cid:173)\nematical foundation, powerful training and recognition algorithms for large speech corpora, \nand a probabilistic framework that can incorporate statistical phonology and syntax (Mor(cid:173)\ngan  &  Bourlard,  1995).  However,  HMMs are  known  to  be  a poor model  of the  speech \nproduction process.  While  speech  production  is  a  continuous,  temporally  evolving  pro(cid:173)\ncess,  HMMs  treat  speech  production as  a  discrete,  finite-state  system  where  the  current \nstate  depends  only  on  the  immediately  preceding  state.  Furthermore,  while  HMMs  are \ndesigned to  capture temporal information as  state transition probabilities, Bourlard et al., \n\n\fMaximum-Likelihood Continuity Mapping (MALCOM) : An Alternative to HMMs \n\n745 \n\n(1995) suggest that when the transition probabilities are replaced by constant values, recog(cid:173)\nnition results do not significantly deteriorate. That is, while transitions are often considered \nthe most perceptually relevent component of speech, the conventional HMM framework is \npoor at capturing transition information. \n\nGiven these deficiencies, we are considering alternatives to the HMM approach that main(cid:173)\ntain  its  strengths  while  improving  upon  its  weaknesses.  This  paper describes  one  such \nmodel  called Maximum-Likelihood Continuity  Mapping  (MALCOM). We  first  review  a \ngeneral statistical framework for speech recognition so that we can compare the HMM and \nMALCOM formulations.  Then  we  consider what the  abstract hidden  state represents  in \nMALCOM,  demonstrating empirically that the  paths  through MALCOM's hidden  space \nare closely related to the movements of the speech production articulators. \n\n2  A GENERAL FRAMEWORK FOR SPEECH RECOGNITION \n\nConsider an unknown speech waveform that is converted by a front-end signal-processing \nmodule into a sequence of acoustic vectors X.  Given a space of possible utterances,  W, \nthe task of speech recognition is to return the most likely utterance W * given the observed \nacoustic sequence X . Using Bayes' rule this corresponds to \n\n(1) \n\nIn recognition,  P(X) is  typically ignored because it is constant over all  W,  and  the pos(cid:173)\nterior P(WIX) is estimated as the product of the prior probability of the word sequence, \nP(W),  and  the  probability  that  the  observed  acoustics  were  generated  by  the  word  se(cid:173)\nquence, P(XI W) . The prior P(W) is estimated by a language model, while the production \nprobability P(X IW) is estimated by an acoustic model. In continuous speech recognition, \nthe product of these terms must be maximized over W; however, in this paper, we will re(cid:173)\nstrict our attention to the form of the  acoustical model only. Every candidate utterance W \ncorresponds to a sequence of word/phone models M w such that P(XIW)  =  P(XIMw), \nand each M w considers all  possible paths through  some \"hidden\"  space.  Thus,  for each \ncandidate utterance, we must calculate \n\nP(X IM w) = i P(XIY , Mw )P(YIMw )dY , \n\n(2) \n\nwhere Y  is some path through the hidden space. \n\n2.1  HIDDEN MARKOV MODELS \n\nBecause  HMMs  are  finite-state  machines  with  a  given  fixed  architecture,  the  path  Y \nthrough the hidden space corresponds to  series  of discrete states, simplifying the integral \nof Eq. (2) to a sum.  However, to avoid computing the contribution of all possible paths, the \nViterbi  approximation-considering only  the  single  path  that maximizes Eq.  (2)-is fre(cid:173)\nquently used without much loss in recognition performance (Morgan &  Bourlard,  1995). \nThus, \n\nP(XIMw) ~ ar!?\"ymaxP(XIY, Mw)P(YIM w). \n\n(3) \n\nThe first term corresponds to the product of the emission probabilities of the acoustics given \nthe  state  sequence  and  is  typically  estimated  by  mixtures  of high-dimensional Gaussian \ndensities.  The second term corresponds to the product of the state transition probabilities. \nHowever,  because Bourlard et al.  (1995) found  that  this  second term  contributes little to \nrecognition performance,  the  modeling power of the  conventional  HMM must reside  in \nthe  first  term.  Training  the  HMM  system  involves estimating  both  the  emission and  the \n\n\f746 \n\nD. A. Nix and J.  E. Hogden \n\ntransition  probabilities from  real  speech data.  The Baum-Welchlforward-backward algo(cid:173)\nrithm (e.g.,  Morgan &  Scofield,  1994) is  the standard computationally efficient algorithm \nfor iteratively estimating these distributions. \n\n2.2  MAXIMUM-LIKELIHOOD CONTINUITY MAPPING (MALCOM) \n\nIn contrast to HMMs, the multi-dimensional MALCOM hidden space is continuous-there \nare  an infinite number states  and paths through them.  While the HMM is  constrained by \na  fixed  architecture,  MALCOM  is  constrained by  the  notion of continuity  of the  hidden \npath. That is,  the path must be smooth and continuous: it may not carry any energy above \na  given cutoff frequency.  Unlike  the  discrete  path  in  an  HMM,  the  smooth hidden  path \nin  MALCOM attempts to emulate the motion of the speech articulators in  what we call  a \ncontinuity map (CM). \n\nUnless  we  know how to evaluate the  integral  of Eq.  (2)  (which we  currently do not),  we \nmust  also  make  the  Viterbi  approximation  and  approximate  P(XIM w )  by  considering \nonly the single path that maximizes the likelihood of the acoustics X  given the utterance \nmodel M w ,  resulting in  Eq. (3) once again.  Analogously,  the first  term,  P(XIY, M w ), \ncorresponds to the acoustic  generation probability  given  the hidden path,  and  the  second \nterm  corresponds  to  the  probability  of the  hidden  path  given  the  utterance model.  This \npaper focuses  on the first  term  because this  is  the  term that produces conventional HMM \nperformance. 1 \n\nCommon to all  Mw is a set of N  probability density functions (pdfs) <I>  that define the CM \nhidden  space, modeling  the  likelihood  of Y  given  X  for  an  N -code vector quantization \n(VQ) of the acoustic space.  Because these pdfs are defined over the low-dimensional CM \nspace instead of the high-dimensional acoustic space (e.g., 6 vs. 40+), MALCOM requires \nmany fewer parameters to be estimated than the corresponding HMM. \n\n3  THE MALCOM ALGORITHM \n\nWe  now  turn  to  developing an  algorithm  to  estimate both  the CM  pdfs  <I>  and  the  corre(cid:173)\nsponding paths Y  that together maximize the likelihood of a given time series of acoustics, \nC  =  P(X I Y , <1\u00bb\n.  This is an extension of the method first proposed by  Hogden (1995), in \nwhich he  instead maximized P(YIX, <1\u00bb  using vowel data from a single speaker.  Starting \nwith random but smooth Y , the MALCOM training algorithm generates a CM by iterating \nbetween the following two steps: (1) Given Y, reestimate <I>  to maximize C;  and (2) Given \n<1>,  reestimate smooth paths Y  to maximize \u00a3. \n\n3.1  LOG LIKELIHOOD FUNCTION \n\nTo  specify the log likelihood function C, we  make two dependence claims and one inde(cid:173)\npendence assumption. First we claim that Yt  depends (to at least some small extent) on all \nother Y in  the  utterance, an  expression  of the  continuity constraint described above.  We \nmake another natural claim that Xt  depends on Yt,  that the path configuration at time t in(cid:173)\nfluences the corresponding acoustics.  However, we do make the conditional independence \nassumption that \n\nNote that Eq. (4) does not assume that each Xt is independent OfXt-l (as is often assumed \nin data modeling); it only  assumes that the conditioning of Xt  on Yt  is  independent from \n\nIHowever,  we are currently developing a model of P(YIM w )  to replace the corresponding (and \n\nuseless) term in the conventional  HMM  formulation as well (Hogden et al. , 1998). \n\nC = P(XIY, <1\u00bb  = IT P(XtIYt , <1\u00bb. \n\nn \n\nt=l \n\n(4) \n\n\fMaximum-Likelihood Continuity Mapping (MALCOM): An Alternative to HMMs \n\n747 \n\nt-l to t. For example, because Xt  depends on Yt, Yt depends on all othery (the smoothness \nconstraint), and Xt-1  depends on Yt-1, Xt  is not assumed to be independent of all other xs \nin the utterance. \n\nWith a log transformation and  an  invocation of Bayes' rule, we obtain the MALCOM log \nlikelihood function: \n\nn \n\nInL:  = L [InP(Ytlxt, <1\u00bb  + InP(xt) -InP(Ytl<l\u00bb]\u00b7 \n\nt=1 \n\n(5) \n\nWe model each P(Yt IXt, <1\u00bb  by a probability density function (pdf) p[Ytlxt, <l>j (Xt)], where \nthe particular model <l>j  depends on  which of the  N  VQ codes Xt  is assigned to.  Here we \nuse a simple multi-dimensional Gaussian for each pdf, but we are currently exploring the \nuse  of multi-modal  mixtures  of Gaussians  to  represent the  pdfs  for  sounds  such  as  stop \nconsonants for  which  the  inverse map  from  acoustics  to  articulation  may  not  be  unique \n(Nix,  1998).  Next,  we  need an  estimate of P(Ytl<l\u00bb,  which can be obtained by  summing \n::::::  L~=l p(YtIXj, <l>j)P(Xj).  We  estimate  P(Xj)  by \nover  all  VQ  partitions:  P(Ytl<l\u00bb \ncalculating the relative frequency of each acoustic code in the VQ codebook. \n\n3.2  PDF ESTIMATION \n\nFor step (1) of training, we use gradient-based optimization to reestimate the means of the \nGaussian pdfs for each acoustic partition, where the gradient of Eq.(5) with respect to  the \nmean of pdf i  is \n\nV' JL.In L:  =  L  E;l(Yt -JLi) _ t {L~=1 p[Y~Xj, <I>(Xj)]P(Xj)Ej 1(Yt  - JLj)} \n\ntEx(t)=x. \n\nt=1 \n\nLj=l p[Ytlxj, <I>(Xj)]P(Xj) \n\n(6) \nwhere E  is the covariance matrix for each pdf.  For the results in this paper, we use a com(cid:173)\nmon radially symmetric covariance matrix for all pdfs and reestimate the covariance matrix \nafter each path optimization step.2  In doing the optimization, we employ the following al(cid:173)\ngorithm: \n\n1.  Make  an  initial guess  of each  JLi  as  the  means  of the path  configurations corre(cid:173)\n\nsponding to the observed acoustics X  E  Xt . \n\n2.  Construct V' JL  In L:  by considering Eq. (6) over all  N  acoustic partitions. \n3.  Determine a search  direction for  the  optimization using,  for example,  conjugate \n\ngradients and perform a line search along this direction (Press et al.,  1988). \n\n4.  Repeat steps [2]-[3] until convergence. \n\nTo  avoid  potential degenerate solutions,  after each pdf optimization step,  the  dimensions \nof the CM  are  orthogonalized.  Furthermore,  because  the  scale  of the  continuity  map  is \nmeaningless (only its topological arrangement matters), the N  pdfmeans are scaled to zero \nmean, unit variance before each path optimization step. \n\n3.3  PATH ESTIMATION \n\nFor  step  (2) of training,  we  use  gradient-based optimization  to  reestimate  Y,  where the \ngradient of the log likelihood function with respect to a specific Yt  is given by \nV'y  InL:  =  V'y,p[Ytlxt,<I>(xt)]  _  V'y, F~=IP[YtIXj,<I>(Xj)]P(Xj) \n\n(7) \n\n, \n\np[YtIXt, <I>(Xt)] \n\nLj=1 p[Ytlxj, <I>(Xj)]P(Xj) \n\n2However,  we  are currently exploring the effects of individual and diagonal covariance matrices. \n\n\f748 \n\nD.  A.  Nix and J.  E.  Hogden \n\nIn doing the optimization, we employ the following gradient-based algorithm: \n\n1.  Make an  initial  guess  of the  path  yO  as  the means  of the pdfs corresponding to \n\nthe observed acoustic sequence X. \n\n2.  Low pass filter yo. \n3.  Construct \\1y InC by  considering Eq. (7) over all t. \n4.  Determine a search direction for  the  optimization using,  for example, conjugate \n\ngradients (Press et at.,  1988). \n\n5.  Low-pass filter this search direction using the same filter as in step [2]. \n6.  Perform a line search along the filtered direction (Press et at.,  1988). \n7.  Repeat steps [3]-[6] until convergence. \n\nBecause neither the line search direction nor the initial estimate yO contains energy above \nthe cutoff frequency of the low-pass filter,  their linear addition-the next estimate of Y -\nwill not contain energy above the cutoff frequency either. Thus, steps [2] and [5] implement \nthe desired smoothness constraint. \n\n4  COMPARNG MALCOM PATHS TO SPEECH ARTICULATION \n\nTo evaluate our claim that MALCOM paths are topologically related to articulator motions, \nwe construct a regression predictor from Y  to measured articulator data using the training \ndata and test the quality of this predictor on  an independent test set. \n\nOur speech corpus consists of data from two male and one female native speakers of Ger(cid:173)\nman.  This data was obtained from Dr.  Igor Zlokarnik and recorded at the Technical Uni(cid:173)\nversity of Munich, Germany using electro-magnetic articulography (EMA) (Perkell et al., \n1992). Each speaker's articulatory measurements and acoustics were recorded for the same \n108 sentences, where each sentence was about 4 seconds long. \n\nThe acoustics  were  recorded using  a room-placed microphone and  sampled  using  16-bit \nresolution at  16 kHz.  Prior to receiving the data from  Munich, the data were resampled at \n11025 Hz.  To  represent the acoustic  signal in  compact vector time-series,  we used  256-\nsample (23.2 msec) Hamming-windowed frames, with a new frame starting every 5.8 msec \n(75% overlap). We  transform each frame into a 13th-order LPC-cepstral coefficient vector \nat  (12  cepstral  features  plus  log  gain-see Morgan& Scofield,  1994).  A  full  acoustical \nfeature vector Xt consists of a window of seven frames such that Xt  is made up of the frames \n{at-6,at-4,at-2,at,at+2, at+4,at+6}.  To VQ the acoustic space we  used the classical  k(cid:173)\nmeans algorithm  (e.g.,  Bishop,  1995),  but  we  used  512 codes  to  model  the  vowel  data, \nand 256 codes each to model the stop consonants, the fricatives, the nasals, and the liquids \n(1536 codes combined).3 \nThe articulatory data consist of the (x, y) coordinates of 4 coils along the tongue and the y(cid:173)\ncoordinates of coils on the jaw and lower lip.  Figure  1 illustrates the approximate location \nof each coil.  The data were originally sampled at 250 Hz but were resampled to  172.26 Hz \nto match one articulatory sample for each 75%-overlapping acoustic frame of 256 samples. \nThe articulatory data were subsequenpy low-pass filtered at 15 Hz to remove measurement \nnoise. \n\nSentences 1-90 were used as  a training set,  and sentences 91-108 were withheld for eval(cid:173)\nuation.  A  separate CM  was  generated for each  speaker using  the  training data.  We  used \nan  8  Hz  cutoff frequency  because  the  measured  articulatory  data  had  very  little  energy \nabove 8 Hz,  and  a 6-dimensional continuity map was  used because the first  six principal \ncomponents capture 99% of the variance of the corresponding articulator data (Nix, 1998). \n\n3This  acoustic  representation  and  VQ scheme  were  determined  to  work  well  for  modeling  real \n\narticulator data (Nix,  1998), so they were used here as well. \n\n\fMaximum-Likelihood Continuity Mapping (MALCOM) : An Alternative to  HMMs \n\n749 \n\n\\ , /  ~Tongue mIddle (T2) \n\n/ \n\nTongue dorsum (T3) \n\n( \n\n\u2022  -_ _ \n\nTongue back (T4) \n\nTOngUetIP~1~j~t;;\"-- -~ \n\n-t \\ \n--i l---------.  J  ,c- \"I \n\\i~~ \n\\ \n, I \nI :'-'-~ ( \nLower lop (LL) ~ .,  '~\"\"~ \nI  .' ~ __ \\ \n(/ /\"\nI , -- - ,~ \n,~\\ ,.~~:,~ \n'~ \\ \n\\ \n\\(, .. ~~J \n' -\n\n~ \nLower jaw (LJ) ,X  t,:',': \n\ni \n\n'\n\n----\n\nFigure 1:  Approximate positions of EMA coils for speech articulation measurements. \n\nBecause the  third  term  in  Eq.  (5)  is  computationally complex,  we  approximated  Eq.  (5) \nby  only  its  first  term (the  second  term  is  constant during  training)  until  In C,  calculated \nat  the  end of each  iteration  using  all  terms, started  to  decrease.  At this  point we  started \nusing  both  the  first  and  third  terms  of Eq.  (5).  In  each  pdf and  path  optimization  step, \nour convergence criterion  was  when  the  maximum  movement  of a  mean  or a  path  was \n<  10- 4 .  Our convergence criterion  for  the entire  algorithm was  when  the  correlation of \nthe paths from one full  iteration of pdf and path optimization to another was> 0.99 in all \ndimensions.  This usually took about 30 iterations. \n\nTo  evaluate  the  extent to  which  MALCOM  hidden  paths  capture  information  related  to \narticulation,  we  used  the  same  training  set  to  estimate  a  non-linear  regression  function \nfrom  the  output generated by  MALCOM  to  the corresponding measured articulator data. \nWe  used  an  ensemble of 10  single-hidden-Iayer,  32-hidden  unit, multi-layer perceptrons \ntrained  on  different 2/3-training,  1/3-early  stopping  partitions  of the  training  set,  where \nthe  results  of the  ensemble on  the  test set  were  averaged  (e.g.,  Bishop,  1995).  A  linear \nregression produced results approximately 10% worse than those we report here. \n\nTo contrast with the unsupervised MALCOM method, we also tested a supervised method \nin which the articulatory data was available for training as well as evaluation. This involved \nonly the pdf optimization step of MALCOM because the paths were fixed as the articulator \nmeasurements. The resulting pdfs were then used in the path optimization step to determine \npaths for  the test data acoustics.  We  could then  measure what fraction  of this  supervised \nperformance the unsupervised MALCOM attained. \n\n5  RESULTS AND CONCLUSIONS \n\nThe results of this regression on the test set are plotted in Figure 2.  The MALCOM paths \nhad  a  median  correlation  of 0.77  with  the  actual  articulator data,  compared  to  0.84  for \nthe  comparable supervised  method.  Thus,  using  only  the  speech  acoustics,  MALCOM \ngenerated continuity maps with correlations to real  articulator measurements only 0.02 to \n0.15 lower than the corresponding supervised model which used articulatory measurements \nas well as acoustics. \n\nGiven that (1) MALCOM fits  into the same probabilistic framework for speech recognition \nas  HMMs and  (2) MALCOM's  hidden  paths capture considerable information  about the \nspeech production process, we believe that MALCOM will prove to be a viable alternative \nto the HMM for speech processing tasks.  Our current work emphasizes developing a word \nmodel to  complete the MALCOM  formulation  and  test  a full  speech recognition system. \nFurthermore, MALCOM  is  applicable to  any  other task  to  which  HMMs can  be  applied, \n\n\f750 \n\nD.  A.  Nix and 1.  E.  Hogden \n\n0.8 \n\n0.2 \n\nOL-~ __ ~~~~ __ ~~~~~-L~~~ __ -L~ __ L J  __  -L~ __ L-~ \n\nT1x \n\nT1y \n\nT2x \n\nT2y \n\nT3x \n\nT3y \n\nT4x \n\nT4y \n\nLLy \n\nArticulator dimension \n\nUy \n\nFigure 2:  Correlation between estimated and actual articulator trajectories on the indepen(cid:173)\ndent test set averaged across speakers.  Each full  bar is  the performance of the supervised \nanalogy to MALCOM, and the horizontal line on each bar is the performance of MALCOM \nitself. \n\nincluding fraud detection (Hogden, 1997) and text processing. \n\nAcknowledgments \nWe would like to thank James Howse and Mike Mozer for their helpful comments on this manuscript \nand Igor Zlokarnik for sharing his data  with us.  This work  was performed under the auspices of the \nU.S. Department of Energy. \n\nReferences \nBishop, C.M. (1995).  Neural Networks for Pattern Recognition, NY:  Oxford University Press, Inc. \nBourlard, H.  Konig, Y.,  &  Morgan, N. (1995).  \"REMAP: Recursive estimation and maximization of \na posteriori probabilities, application to transition-based connectionist speech recognition,\" \nInternational Computer Science Institute Technical  Report TR-94-064. \n\nHogden, J. (1995).  \"Improving on hidden Markov  models:  an  articulatorily constrained, maximum(cid:173)\nlikelihood approach to speech recognition and speech coding,\" Los Alamos National Lab(cid:173)\noratory Technical Report, LA-UR-96-3945. \n\nNational Laboratory Technical Report, LA-UR-97-992. \n\nHogden,  J.  (1997).  \"Maximum  likelihood  continuity  mapping  for  fraud  detection,\"  Los  Alamos \nHogden,  1.,  Nix,  D.A.,  Gracco, v.,  &  Rubin,  P.  (1998).  \"Stochastic  word  nodels  for  articulatorily \nconstrained speech recognition and synthesis,\" submitted to Acoustical Society of America \nConference,  1998. \n\nMorgan,  N.  &  Bourlard, H.A.  (1995).  \"Neural  Networks for  Statistical Recognition of Continuous \n\nSpeech,\" Proceedings of the IEEE, 83(5), 742-770. \n\nMorgan,  D.P.,  &  Scofield,  c.L.  (1992).  Neural  Networks  and Speech  Processing,  Boston,  MA: \n\nKluwer Academic Publishers. \n\nNix, D.A. (1998).  Probabilistic methods for inferring vocal-tract articulation from  speech acoustics, \n\nPh.D. Dissertation, U.  of CO at Boulder, Dept. of Computer Science, in preparation. \n\nPerkell, 1.S., Cohen,  M.H., Svirsky,  M.A., Matthies,  M.L., Garabieta, I., &  Jackson,  M.T.T. (1992). \n\"Electromagnetic  midsagittal  articulo meter  systems  for  transducing  speech  articulatory \nmovements,\" Journal of the Acoustical Society of America, 92(6), 3078-3096. \n\nPress,  W.H.,  Teukolsky,  S.A.,  Vetterling,  w.T.,  &  Flannery,  B.P. (1988).  Numerical  Recipes in  C \n\nCambridge University Press. \n\nYoung,  SJ.  (1996).  \"A  review  of large-vocabulary  continuous  speech  recognition,\"  IEEE Signal \n\nProcessing Magazine,  September, 45-57. \n\n\f", "award": [], "sourceid": 1634, "authors": [{"given_name": "David", "family_name": "Nix", "institution": null}, {"given_name": "John", "family_name": "Hogden", "institution": null}]}