{"title": "Extracting and Learning an Unknown Grammar with Recurrent Neural Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 317, "page_last": 324, "abstract": null, "full_text": "Extracting and Learning an Unknown Grammar with \n\nRecurrent Neural Networks \n\nC.L.Gnes\u00b7, C.B. Miller \nNEC Research Institute \n4 Independence Way \nPrinceton. NJ. 08540 \ngiles@research.nj.nec.COOl \n\nD. Chen, G.Z. Sun, B.H. Chen, V.C. Lee \n*Institute for Advanced Computer Studies \n\nDept of Physics and Astronomy \n\nUniversity of Maryland \nCollege pm, Mel 20742 \n\nAbstract \n\nSimple secood-order recurrent netwoIts are shown to readily learn sman brown \nregular grammars when trained with positive and negative strings examples. We \nshow that similar methods are appropriate for learning unknown grammars from \nexamples of their strings. TIle training algorithm is an incremental real-time, re(cid:173)\ncurrent learning (RTRL) method that computes the complete gradient and updates \nthe weights at the end of each string. After or during training. a dynamic clustering \nalgorithm extracts the production rules that the neural network has learned.. TIle \nmethods are illustrated by extracting rules from unknown deterministic regular \ngrammars. For many cases the extracted grammar outperforms the neural net from \nwhich it was extracted in correctly classifying unseen strings. \n\nINTRODUCTION \n\n1 \nFor many reasons, there has been a long interest in \"language\" models of neural netwoIts; \nsee [Elman 1991] for an excellent discussion. TIle orientation of this work is somewhat dif(cid:173)\nferent TIle focus here is on what are good measures of the computational capabilities of \nrecurrent neural networks. Since currently there is little theoretical knowledge, what prob(cid:173)\nlems would be \"good\" experimental benchmarks? For discrete i.q>uts, a natural choice \nwould be the problem of learning fonnal grammars - a \"hard\" problem even for regular \ngrammars [Angluin, Smith 1982]. Strings of grammars can be presented one charncter at a \ntime and strings can be of arbitrary length. However, the strings themselves would be, for \nthe most part, feature independent Thus, the learning capabilities would be, for the most \npart, feature independent and, therefore insensitive to feature extraction choice. \nTIle learning of known grammars by recurrent neural networks has sbown promise, for ex(cid:173)\nample [Qeeresman, et al1989], [Giles, et al199O, 1991, 1992], [pollack 1991], [Sun, et al \n1990], [Watrous, Kuhn 1992a,b], [Williams, Zipser 1988]. But what about learning Ml!(cid:173)\n~ grammars? We demonstrate in this paper that not only can unknown grammars be \nlearned, but it is possible to extract the grammar from the neural network, both during and \nafter training. Furthennore, the extraction process requires no a priori knowledge about the \n\n317 \n\n\f318 \n\nGiles, Miller, Chen, Sun, Chen, and Lee \n\ngrammar, except that the grammar's representation can be regular, which is always true for \na grammar of bounded string length; which is the grammatical \"training sample.\" \n2 FORMAL GRAMMARS \nWe give a brief introduction to grammars; for a more detailed explanation see [Hopcroft & \nUllman, 1979]. We define a grammar as a 4-mple (N, V, P, S) where N and V are DOOler(cid:173)\nminal and tenninal vocabularies, P is a finite set of production rules and S is the start sym(cid:173)\nbol. All grammars we discuss are detelUlinistic and regular. For every grammar there exists \na language - the set of strings the grammar generates - and an automaton - the machine that \nrecognizes (classifies) the grammar's strings. For regular grammars, the recognizing ma(cid:173)\nchine is a deterministic finite automaton (DFA). There exists a one-ta-one mapping be(cid:173)\ntween a DFA and its grammar. Once the DFA is known, the production rules are the \nordered triples (notk, arc, 1Wde). \nGrammatical inference [Fu 1982] is defined as the problem of finding (learning) a grammar \nfrom a finite set of strings, often called the training sample. One can interpret this problem \nas devising an inference engine that learm and extracts the grammar, see Figure I. \n\nUNKNOWN \nGRAMMAR \n\nLabeBed \nstriDgs \n\n.... \n-\n\nINFERENCE \n\nENGINE \n(NEURAL \nNETWQRKl \n\nINFERRED \nGRAMMAR \n\nExtraction \nProcess \n\n.. \n\nFigure I: Grammatical inference \n\nFor a training sample of positive and negative strings and no knowledge of the unknown \nregular grammar, the problem is NP..complete (for a summary, see [Angluin, Smith 1982]). \nIt is possible to construct an inference engine that consists of a recurrent neural network and \na rule extraction process that yields an inferred grammar. \n3 RECURRENT NEURAL NETWORK \n3.1 ARCHITEcruRE \nOur recmrent neural network is quite simple and can be considered as a simplified version \nof the model by [Elman 1991]. For an excellent discussion of recurrent networks full of ref(cid:173)\nerences that we don't have room for here, see [Hertz, et all99I]. \nA fairly general expression for a recunent network (which has the same computational \npower as a DFA) is: \n\ns~+ I = F(St I\u00b7W) \nr \n\nj' , \n\nwhere F is a nonlinearity that maps the stale neuron Sl and the input neuron 1 at time t to \nthe next state S'+ 1 at time t+ 1. The weight matrix W parameterizes the mapping and is usu(cid:173)\nally leamed (however, it can be totally or partially programmed). A DFA has an analogous \nmapping but does not use W. For a recurrent neural network we define the mapping F and \norder of the mapping in the following manner [Lee, et aI 1986]. For a first-order recmrent \nnet: \nwhere N is the number of hidden state neurons and L the number of input neurons; Wij and \nYij are the real-valued weights for respectively the stale and input neurons; and (J is a stan-\n\n\fExtracting and Learning an Unknown Grammar with Recurrent Neural Networks \n\n319 \n\nN \n\nS:+1 = a (7WilJ + Pi/!) \n\nL \n\ndard sigmoid discriminant function. The values of the hidden state neurons Sl are defined \nin the finite N-dimensional space [O,I]N. Assuming all weights are connected and the net \nis fully recurrent, the weight space complexity is bounded by O(N2+NL). Note that the in(cid:173)\nput and state neurons are not the same neurons. This representation has the capability. as(cid:173)\nsuming sufficiently large N and L, to represent any state machine. Note that there are non(cid:173)\ntrainable unit weights on the recurrent feedback connections. \nTIle natural second-order extension of this recurrent net is: \n\nwhere certain state neurons become input neurons. Note that the weights Wijk modify a \nproduct of the hidden Sj and input Ik neurons. This quadratic fonn directly represents the \nstate transition diagrams of a state automata process -- (input, state) ::::) (next-state) and thus \nmakes the state transition mapping very easy to learn. It also pennits the net to be directly \nprogrammed to be a particular DFA. Unpublished experiments comparing first and second \norder recurrent nets confirm this ease-in-Iearning hypothesis. The space complexity (num(cid:173)\nber of weights) is O(LN2). For L\u00abN, both first- and second-order are of the same complex(cid:173)\nity,O(N2). \n3.2 SUPERVISED TRAINING & ERROR FUNCTION \nThe error function is defined by a special recurrent output neuron which is checked at the \nend of each string presentation to see if it is on or off. By convention this output neuron \nshould be on if the string is a positive example of the grammar and off if negative. In prac(cid:173)\ntice an error tolerance decides the on and off criteria; see [Giles, et all991] for detail. [If a \nmulticlass recognition is desired, another error scheme using many output neurons can be \nconstructed.] We define two error cases: (1) the networl.c fails to reject a negative string (the \noutput neuron is on); (2) the network fails to accept a positive string (the output neuron is \noft). This accept or reject occurs at the end of each string - we define this problem as infer(cid:173)\nence versus prediction.There is no prediction of the next character in the string sequence. \nAs such, inference is a more difficult problem than prediction. If knowledge of the classi(cid:173)\nfication of every substring of every string exists and alphabetical training order is pre(cid:173)\nserved, then the prediction and inference problems are equivalent. \nThe training method is real-time recurrent training (RTRL). For more details see [Williams, \nZipser 1988]. The error function is defined as: \n\n2 \nE = (1/2) (Target-S~) \n\nwhere Sf is the output neuron value at the final time step t=fwhen the final character is \npresented and Target is the desired value of (1.0) for (positive. negative) examples. Using \ngradient descent training, the weight update rule for a second-order recurrent net becomes: \n\n{ d~ \nW1mn = -aV E = a(Target-So) . dW \n\nlmn \n\nwhere a is the learning rate. From the recursive network state equation we obtain the rela(cid:173)\ntionship between the derivatives of st and St+l: \n\n\f320 \n\nGiles, Miller, Chen, Sun, Chen, and Lee \n\n~; = a'\u00b7 [f>US~-lr.-l + l:W;jtt.-l~~-l J \n\n1m\" \n\njk \n\n1m\" \n\nwhere a' is the derivative of the discriminant function. This pennits on-line learning with \npartial derivatives calculated iteratively at each time step. Let \"dS'=O IdWlmn = O. Note that \nthe space complexity is O(L 2~) which can be prohibitive for large N and full connectivity. \nIt is important to note that for all training discussed here, the full gradient is calculated as \ngiven above. \n3.3 PRESENTATION OF TRAINING SAMPLES \nThe training data consists of a series of stimulus-response pairs, where the stimulus is a \nstring ofO's and 1 's, and the response is either \"I\" for positive examples or \"0\" for negative \nexamples. The positive and negative strings are generated by an unknown source grammar \n(created by a program that creates random grammars) prior to training. At each discrete \ntime step, one symbol from the string activates one input neuron, the other input neurons \nare zero (one-hot encoding). Training is on-line and occurs after each string presentation; \nthere is no total error accumulation as in batch learning; contrast this to the batch method \nof [Watrous, Kuhn 1992]. An extra end symbol is added to the string alphabet to give the \nnetwork more power in deciding the best final neuron state configuration. This requires an(cid:173)\nother input neuron and does not increase the complexity of the DFA (only N2 more \nweights). The sequence of strings presented during training is very important and certainly \ngives a bias in learning. We have perfonned many experiments that indicate that training \nwith alphabetical order with an equal distribution of positive and negative examples is \nmuch faster and converges more often than random order presentation. \nTIle training algorithm is on-line, incremental. A small portion of the training set is pre(cid:173)\nselected and presented to the network. The net is trained at the end of each string presenta(cid:173)\ntion. Once the net has learned this small set or reaches a maximum number of epochs (set \nbefore training, 1000 for experiments reported), a small number of strings (10) classified \nincorrectly are chosen from the rest of the training set and added to the pre-selected set. This \nsmall string increment prevents the training procedure from driving the network too far to(cid:173)\nwards any local minima that the misclassified strings may represent. Another cycle of ep(cid:173)\noch training begins with the augmented training set. If the net correctly classifies all the \ntraining data, the net is said to converge. The total number of cycles that the network is per(cid:173)\nmitted to run is also limited, usually to about 20. \n4 RULE EXTRACTION (DFA GENERATION) \nAs the network is training (or after training), we apply a procedure we call dynamic state \npartitioning (dsp) for extracting the network's current conception of the DF A it is learning \nor has learned. The rule extraction process has the following steps: 1) clustering of DFA \nstates, 2) constructing a transition diagram by connecting these states together with the al(cid:173)\nphabet-labelled transitions, 3) putting these transitions together to make the full digraph -\nfonning cycles, and 4) reducing the digraph to a minimal representation. The hypothesis is \nthat during training, the network begins to partition (or quantize) its state space into fairly \nwell-separated, distinct regions or clusters, which represent corresponding states in some \nDFA. See [Cleeremans, et al1989] and [Watrous and Kuhn 1992a] for other clustering \nmethods. A simple way of finding these clusters is to divide each neuron's range [0,1] into \nq partitions of equal size. For N state neurons, qN partitions. For example, for q=2, the val(cid:173)\nues of S'~.5 are 1 and S'<.0.5 are 0 and there are 2N regions with 2N possible values. Thus \nfor N hidden neurons, there exist I' possible regions. The DFA is constructed by generating \n\n\fExtracting and Learning an Unknown Grammar with Recurrent Neural Networks \n\n321 \n\na state transition diagram -- associating an input symbol with a set of hidden neuron parti(cid:173)\ntions that it is currently in and the set of neuron partitions it activates. This ordered triple \nis also a production rule. The initial partition, or start state of the DFA, is detennined from \nthe initial value of St=O. If the next input symbol maps to the same partition we assume a \nloop in the DFA. Otherwise, a new state in the DFA is fonned.This constructed DFA may \ncontain a maximum of cf states; in practice it is usually much less, since not all neuron par(cid:173)\ntition sets are ever reached. This is basically a tree pruning method and different DFA could \nbe generated based on the choice of branching order. TIle extracted DF A can then be re(cid:173)\nduced to its minimal size using standard minimization algorithms (an 0(N2) algorithm \nwhere N is the number of DFA states) [Hopcroft, Ullman 1979]. [This minimization pro(cid:173)\ncedure does not change the grammar of the DFA; the unminimized DFA has same time \ncomplexity as the minimized DFA. TIle process just rids the DFA of redundant, unneces(cid:173)\nsary states and reduces the space complexity.] Once the DF A is known, the production rules \nare easily extracted. \nSince many partition values of q are available, many DF A can be extracted. How is the q \nthat gives the best DFA chosen? Or viewed in another way, using different q, what DFA \ngives the best representation of the grammar of the training set? One approach is to use dif(cid:173)\nferent q's (starting with q=2), different branching order, different runs with different num(cid:173)\nbers of neurons and different initial conditions, and see if any similar sets of DFA emerge. \nChoose the DFA whose similarity set has the smallest number of states and appears most \noften - an Occam's razor assumption. Define the guess of the DFA as DFAg.This method \nseems to woIk fairly well. Another is to see which of the DFA give the best perfonnance \non the training set, assuming that the training set is not perfectly learned. We have little ex(cid:173)\nperience with this method since we usually train to perfection on the training set It should \nbe noted that this DF A extraction method may be applied to any discrete-time recurrent \nnet, regardless of network order or number of hidden layers. Preliminary results on first(cid:173)\norder recurrent networks show that the same DFA are extracted as second-order, but the \nfirst-order nets are less likely to converge and take longer to converge than second-order. \n5 SIMULATIONS - GRAMMARS LEARNED \nMany different small \u00ab 15 states) regular known grammars have been learned successfully \nwith both first-order [Cleeremans, et al1989] and second-order recurrent models [Giles, et \nal 91] and [Watrous, Kuhn 1992a]. In addition [Giles, et al1990 & 1991] and [Watrous, \nKuhn 1992b] show how corresponding DFA and production rules can be extracted. How(cid:173)\never for all of the above work, the grammars to be learned were alreatb known. What is \nmore interesting is the learning of unknown grammars. \nIn figure 2b is a randomly generated minimallO-state regular grammar created by a pro(cid:173)\ngram in which the only inputs are the number of states of the umninimized DFA and the \nalphabet size p. (A good estimate of the number of possible unique DFA is (n2lln1'\"/n!) \n[Aton, et al1991] where n is number ofDFA states) TIle shaded state is the start state, filled \nand dashed arcs represent 1 and 0 transitions and all final states have a shaded outer circle. \nThis unknown (honestly, we didn't look) DFA was learned with both 6 and 10 hidden state \nneuron second-order recurrent nets using the first 1000 strings in alphabetical training order \n(we could ask the unknown grammar for strings). Of two runs for both 10 and 6 neurons, \nboth of the 10 and one of the 6 converged in less than 1000 epochs. (TIle initial weights \nwere all randomly chosen between [1,-1] and the learning rate and momentum were both \n0.5.) Figure 2a shows one of the unminimized DFA that was extracted for a partition pa(cid:173)\nrameter of q=2. The minimized 10-state DFA, figure 3b, appeared for q=2 for one 10 neu(cid:173)\nron net and for q=2,3,4 of the converged 6 neuron net Consequently, using our previous \ncriteria, we chose this DFA as DFAg, our guess at the unknown grammar. We then asked \n\n\f322 \n\nGiles, Miller, Chen, Sun, Chen, and Lee \n\nFigures 2a & 2b. Unminimized and minimized 100state random grammar. \n\nthe program what the grammar was and discovered we were correct in our guess. The other \nminimized DFA for different q's were all unique and usually very large (number of states \n> 1(0). \nThe trained recurrent nets were then checked for generalization errors on all strings up to \nlength 15. All made a small number of errors, usually less than 1 % of the total of 65,535 \nstrings. However, the correct extracted DFA was perfect and, of course, makes no errors on \nstrings of any length. Again, [Giles, et a11991, 1992], the extracted DFA outperforms the \ntrained neural net from which the DF A was extracted. \nFigures 3a and 3b, we see the dynamics ofDFA extraction as a 4 bidden neuron neural net(cid:173)\nwork is leaming as a function of epoch and partition size. This is for grammar Tomita-4 \n[Giles, et al 1991, 1992]] - a 4-state grammar that rejects any string which has more than \nthree 0' s in a row. The number of states of the extracted DF A starts out small, then increas(cid:173)\nes, and finally decreases to a constant value as the grammar is learned As the partition q of \nthe neuron space increases, the number of minimized and unminimized states increases. \nWhen the grammar is learned, the number of minimized states becomes constant and, as \nexpected, the number of minimized states, independent of q, becomes the number of states \nin the grammar's DFA - 4. \n6 CONCLUSIONS \nSimple recurrent neural networks are capable ofleaming small regular unknown grammars \nrather easily and generalize fairly well on unseen grammatical strings. The training results \nare fairly independent of the initial values of the weights and numbers of neurons. For a \nwell-trained neural net, the generalization perfonnance on long unseen strings can be per(cid:173)\nfect. \n\n\fExtracting and Learning an Unknown Grammar with Recurrent Neural Networks \n\n323 \n\nUnminbnlzed \n\nMinimized \n\ntriangles q=4 \n\n3S \n\n30 \n\nfIJ \n\n11 \n25 \n~ 20 \n~ \n] \n.. \nCol e \n\n15 \n\n10 \n\n~ \nr;iI \n\n3S \n\n30 \n\n25 \n\n20 \n\n15 \n\n10 \n\nJ fIJ \n~ \n~ \n] \n\nCol \nII \n\n~ \nr;iI \n\n5 \no \n\no 10 20 30 40 SO 60 70 \n\n5 \n\n04-~--~-r~r-'-~--~ \n\n0 \n\n10 20 30 40 SO 60 70 \n\nE~b \n\nE~b \n\nFigures 3a & 3b. Size of number of states (unmioimized and minimized) ofDFA \nversus training epoch for different partition parameter q. The correct state size is 4. \n\nA heuristic algorithm called dynamic state partitioning was created to extract detenninistic \nfinite state automata (DFA) from the neural network, both during and after training. Using \na standard DFA minimization algorithm, the extracted DFA can be reduced to an equivalent \nminimal-state DFA which has reduced space (not time) complexity. When the source or \ngenerating grammar is unknown, a good guess of the unknown grammar DFAg can be ob(cid:173)\ntained from the minimal DFA that is most often extracted from different runs WIth different \nnumbers of neurons and initial conditions. From the extracted DF A, minimal or not, the \nproduction rules of the learned grammar are evident. \nThere are some interesting aspects of the extracted DFA. Each of the unminimized DFA \nseems to be unique, even those with the same number of states. For recunent nets that con(cid:173)\nverge, it is often possible to extract DFA that are perfect, i.e. the grammar of the unknown \nsource grammar. For these cases all unminimized DFA whose minimal sizes have the same \nnumber of states constitute a large equivalence class of neural-net-generated DFA. and \nhave the same performance on string classification.This equivalence class extends across \nneural networks which vary both in size (number of neurons) and initial conditions. Thus. \nthe extracted DF A gives a good indication of how well the neural network learns the gram(cid:173)\nmar. \nIn fact, for most of the trained neural nets, the extracted DF ~ outperforms the \ntrained neural networks in classification of unseen strings. (By aefinition, a perfect \nDFA will correctly classify all unseen strings). This is not surprising due to the possibility \nof error accumulation as the neural network classifies long unseen strings [pollack 1991]. \nHowever, when the neural network has leamed the grammar well, its generalization perfor(cid:173)\nmance can be perfect on all strings tested [Giles, et al1991, 1992]. Thus, the neural network \ncan be considered as a tool for extracting a DF A that is representative of the unknown \ngrammar. Once the DFAg is obtained, it can be used independently of the trained neural \nnetwork. \n\nThe learning of small DFA using second-order techniques and the full gradient computa(cid:173)\ntion reported here and elsewhere [Giles, et all991, 1992], [Watrous, Kuhn 1992a, 1992b] \ngive a strong impetus to using these techniques for learning DFA. The question of DFA \nstate capacity and scalability is unresolved. Further work must show how well these ap-\n\n\f324 \n\nGiles, Miller, Chen, Sun, Chen, and Lee \n\nproaches can model grammars with large numbers of states and establish a theoretical and \nexperimental relationship between DFA state capacity and neural net size. \nAcknowledgments \nTIle authors acknowledge useful and helpful discussions with E. Baum, M. Goudreau, G. \nKuhn, K. Lang, L. Valiant, and R. Watrous. The University of Maryland authors gratefully \nacknowledge partial support from AFOSR and DARPA. \nReferences \nN. Alon, A.K. Dewdney, and T.J.Ott, 'Efficient Simulation of Fmite Automata by Neural \nNets, Journal of the ACM, Vol 38,p. 495 (1991). \nD. Angluin, C.H. Smith, Inductive Inference: Theory and Methods, ACM Computing Sur(cid:173)\nveys, Vol 15, No 3, p. 237, (1983). \nA. Cleeremans, D. Servan-Scbreiber, J. McClelland, Finite State Automata and Simple Re(cid:173)\ncurrent Recurrent Networks, Neural Computation, Vol 1, No 3, p. 372 (1989). \n\nlL. Elman, Distributed Representations, Simple Recurrent Networks, and Grammatical \nStructure, Machine Learning, Vol 7, No 2{3, p. 91 (1991). \nK.S. Fu, Syntactic Panern Recognition and Applications, Prentice-Hall, Englewood Cliffs, \nNJ. Ch10 (1982). \nC.L. Giles, G.Z. Sun, H.H. Chen, Y.C. Lee, D. Olen, Higher Order Recurrent Networks & \nGrammatical Inference, Advances in Neural Information Systems 2, D.S. Touretzky (ed), \nMorgan Kaufmann, San Mateo, Ca, p.380 (1990). \nC.L. Giles, D. Chen, C.B. Miller, H.H. Chen, G.Z. Sun, Y.C. Lee, Grammatical Inference \nUsing Second-Order Recurrent Neural Networks, Proceedings of the International Joint \nConference on Neural Networks, IEEE91CH3049-4, Vol 2, p.357 (1991). \nC.L. Giles, C.B. Miller, D. Chen, H.H. Chen, G.Z. Sun, Y.C. Lee, Learning and Extracting \nFinite State Automata with Second-Order Recurrent Neural Networks, Neural Computa(cid:173)\ntion, accepted for publication (1992). \nJ. Hertz, A. Krogh, R.G. Palmer, Introduction to the Theory of Neural Computation, Add(cid:173)\nison-Wesley, Redwood City, Ca., Ch. 7 (1991). \nJ.E. Hopcroft, J.D. Ullman, Introduction to Automata Theory, Languages, and Computa(cid:173)\ntion, Addison Wesley, Reading, Ma. (1979). \nY.C. Lee, G. Doolen, H.H. Olen, G.Z. Sun, T. Maxwell, H.Y. Lee, C.L. Giles, Machine \nLearning Using a Higher Order Correlational Network, PhysicaD, Vol 22-D, Nol-3, p. 276 \n(1986). \nJ .B. Pollack, The Induction of Dynamical Recognizers, Machine Learning, Vol 7, No 2/3, \np. 227 (1991). \nG.Z. Sun, H.H. Chen, C.L. Giles, Y.C. Lee, D. Chen, Connectionist Pushdown Automata \nthat Learn Context-Free Grammars, Proceedings of the International Joint Conference on \nNeural, Washington D.C., Lawrence Erlbaum Pub., Vol It p. 577 (1990). \nR.L. Watrous, G.M. Kuhn, Induction of Finite-State Languages Using Second-Order Re(cid:173)\ncurrent Networks, Neural Computation, accepted for publication (l992a) and these pro(cid:173)\nceedings, (1992b). \nRJ. Williams, D. Zipser, A Learning Algorithm for Continually Running Fully Recurrent \nNeural Networks, Neural Computation, Vol 1, No 2, p. 270, (1989). \n\n\f", "award": [], "sourceid": 555, "authors": [{"given_name": "C. L.", "family_name": "Giles", "institution": null}, {"given_name": "C. B.", "family_name": "Miller", "institution": null}, {"given_name": "D.", "family_name": "Chen", "institution": null}, {"given_name": "G. Z.", "family_name": "Sun", "institution": null}, {"given_name": "H. H.", "family_name": "Chen", "institution": null}, {"given_name": "Y. C.", "family_name": "Lee", "institution": null}]}