{"title": "Combined Neural Network and Rule-Based Framework for Probabilistic Pattern Recognition and Discovery", "book": "Advances in Neural Information Processing Systems", "page_first": 444, "page_last": 451, "abstract": null, "full_text": "Combined Neural Network and Rule-Based \n\nFramework for Probabilistic Pattern Recognition \n\nand Discovery \n\nHayit K. Greenspan and Rodney Goodman \n\nDepartment of Electrical Engineering \n\nCalifornia Institute of Technology, 116-81 \n\nPasadena, CA 91125 \n\nRama Chellappa \n\nDepartment of Electrical Engineering \n\nInstitute for Advanced Computer Studies and Center for Automation Research \n\nUniversity of Maryland, College Park, MD 20742 \n\nAbstract \n\nA combined neural network and rule-based approach is suggested as a \ngeneral framework for pattern recognition. This approach enables unsu(cid:173)\npervised and supervised learning, respectively, while providing probability \nestimates for the output classes. The probability maps are utilized for \nhigher level analysis such as a feedback for smoothing over the output la(cid:173)\nbel maps and the identification of unknown patterns (pattern \"discovery\"). \nThe suggested approach is presented and demonstrated in the texture -\nanalysis task. A correct classification rate in the 90 percentile is achieved \nfor both unstructured and structured natural texture mosaics. The advan(cid:173)\ntages of the probabilistic approach to pattern analysis are demonstrated. \n\n1 \n\nINTRODUCTION \n\nIn this work we extend a recently suggested framework (Greenspan et al,1991) for \na combined neural network and rule-based approach to pattern recognition. This \napproach enables unsupervised and supervised learning, respectively, as presented \n\n444 \n\n\fA Framework for Probabilistic Pattern Recognition and Discovery \n\n445 \n\nin Fig. 1. In the unsupervised learning phase a neural network clustering scheme is \nused for the quantization of the input features. A supervised stage follows in which \nlabeling of the quantized attributes is achieved using a rule based system. This \ninformation theoretic technique is utilized to find the most informative correlations \nbetween the attributes and the pattern class specification, while providing proba(cid:173)\nbility estimates for the output classes. Ultimately, a minimal representation for a \nlibrary of patterns is learned in a training mode, following which the classification \nof new patterns is achieved. \n\nThe suggested approach is presented and demonstrated in the texture - analysis \ntask. Recent results (Greenspan et aI, 1991) have demonstrated a correct classifica(cid:173)\ntion rate of 95 - 99% for synthetic (texton) textures and in the 90 percentile for 2 - 3 \nclass natural texture mosaics. In this paper we utilize the output probability maps \nfor high-level analysis in the pattern recognition process. A feedback based on the \nconfidence measures associated with each class enables a smoothing operation over \nthe output maps to achieve a high degree of classification in more difficult (natural \ntexture) pattern mosaics. In addition, a generalization of the recognition process \nto identify unknown classes (pattern \"discovery\"), in itself a most challenging task, \nis demonstrated. \n\n2 FEATURE EXTRACTION STAGE \n\nThe initial stage for a classification system is the feature extraction phase through \nwhich the attributes of the input domain are extracted and presented towards fur(cid:173)\nther processing. The chosen attributes are to form a representation of the input \ndomain, which encompasses information for any desired future task. \n\nIn the texture-analysis task there is both biological and computational evidence \nsupporting the use of Gabor filters for the feature - extraction phase (Malik and \nPerona, 1990; Bovik et aI, 1990). Gabor functions are complex sinusoidal gratings \nmodulated by 2-D Gaussian functions in the space domain, and shifted Gaussians in \nthe frequency domain. The 2-D Gabor filters form a complete but non-orthogonal \nbasis which can be used for image encoding into multiple spatial frequency and ori(cid:173)\nentation channels. The Gabor filters are appropriate for textural analysis as they \nhave tunable orientation and radial frequency bandwidths, tunable center frequen(cid:173)\ncies, and optimally achieve joint resolution in space and spatial frequency. \n\nIn this work, we use the Log Gabor pyramid, or the Gabor wavelet decomposition \nto define an initial finite set of filters. We implement a pyramidal approach in the \nfiltering stage reminiscent of the Laplacian Pyramid (Burt and Adelson, 1983). In \nour simulations a computationally efficient scheme involves a pyramidal represen(cid:173)\ntation of the image which is convolved with fixed spatial support oriented Gabor \nfilters. Three scales are used with 4 orientations per scale (0,90,45,-45 degrees), to(cid:173)\ngether with a non-oriented component, to produce a 15-dimensional feature vector \nfor every local window in the original image, as the output of the feature extraction \nstage. \n\nThe pyramidal approach allows for a hierarchical, multiscale framework for the \nimage analysis. This is a desirable property as it enables the identification of features \nat various scales of the image and thus is attractive for scale-invariant pattern \n\n\f446 \n\nGreenspan, Goodman, and Chellappa \n\nrecognition. \n\nUNSUPERVISED \n\nLEARNING \n\nSUPERVISED \nLEARNING \n\nvia \n\nKohonenNN \n\nvia \n\nRule-System \n\nTEXTURE \n\nCLASSES \n\nWindow \n\nof Input Image \n\nN-Dimensional \n\nContinuous \n\nFeature- Vector \n\nN-Dimensional \n\nQuantized \n\nFeature- Vector \n\n~4 __ --------\"~~~. __ -----------4~~ ~.\"------------4.~ \n\nFEA TURE-EXTRACTION UNSUPERVISED \n\nSUPERVISED \n\nPHASE \n\nLEARNING \n\nLEARNING \n\nFigure 1: System Block Diagram \n\n3 QUANTIZATION VIA UNSUPERVISED LEARNING \n\nThe unsupervised learning phase can be viewed as a preprocessing stage for achiev(cid:173)\ning yet another, more compact representation, of the filtered input. The goal is to \nquantize the continuous valued features which are the result of the initial filtering \nstage. The need for discretization becomes evident when trying to learn associations \nbetween attributes in a statistically-based framework, such as a rule-based system. \nMoreover, in an extended framework, the network can reduce the dimension of the \nfeature domain. This shift in representation is in accordance with biological based \nmodels. \n\nThe output of the filtering stage consists of N (=15) continuous valued feature maps; \neach representing a filtered version of the original input. Thus, each local area of the \ninput image is represented via an N-dimensional feature vector. An array of such \nN -dimensional vectors, viewed across the input image, is the input to the learning \nstage. We wish to detect characteristic behavior across the N-dimensional feature \nspace for the family of textures to be learned. By projecting an input set of samples \nonto the N-dimensional space, we search for clusters to be related to corresponding \ncode-vectors, and later on, recognized as possible texture classes. A neural-network \nquantization procedure, based on Kohonen's model (Kohonen, 1984) is utilized for \nthis stage. \n\nIn this work each dimension, out of the N-dimensional attribute vector, is indi(cid:173)\nvidually clustered. All samples are thus projected onto each axis of the space and \n\n\fA Framework for Probabilistic Pattern Recognition and Discovery \n\n447 \n\none-dimensional clusters are found; this scalar quantization case closely resembles \nthe K-means clustering algorithm. The output of the preprocessing stage is an \nN -dimensional quantized vector of attributes which is the result of concatenating \nthe discrete valued codewords of the individual dimensions. Each dimension can be \nseen to contribute a probabilistic differentiation onto the different classes via the \nclusters found. As some of the dimensions are more representative than others, it \nis the goal of the supervised stage to find the most informative dimensions for the \ndesired task (with the higher differentiation capability) and to label the combined \nclustered domain. \n\n4 SUPERVISED LEARNING VIA A RULE-BASED \n\nSYSTEM \n\nIn the supervised stage we utilize the existing information in the feature maps for \nhigher level analysis, such as input labeling and classification. In particular we need \nto learn a classifier which maps the output attributes of the unsupervised stage to \nthe texture class labels. Any classification scheme could be used. However, we \nutilize a rule - based information theoretic approach which is an extension of a \nfirst order Bayesian classifier, because of its ability to output probability estimates \nfor the output classes (Goodman et aI, 1992). The classifier defines correlations \nbetween input features and output classes as probabilistic rules of the form: If \nY = y then X = x with prob. p, where Y represents the attribute vector and \nX is the class variable. A data driven supervised learning approach utilizes an \ninformation theoretic measure to learn the most informative links or rules between \nthe attributes and the class labels. Such a measure was introduced as the J measure \n(Smyth and Goodman, 1991) which represents the information content of a rule as \nthe average bits of information that attribute values y give about the class X. \nThe most informative set of rules via the J measure is learned in a training stage, \nfollowing which the classifier uses them to provide an estimate of the probability \nof a given class being true. When presented with a new input evidence vector, Y, \na set of rules can be considered to \"fire\" . The classifier estimates the log posterior \nprobability of each class given the rules that fire as: \n\nlogp(xlrules that fire) = logp(x) + L Wj \n\nj \n\nW . = log (P(x IY\u00bb) \n\np(x) \n\n1 \n\nwhere p(x) is the prior probability of the class x, and Wj represents the evidential \nsupport for the class as provided by rule j. Each class estimate can now be com(cid:173)\nputed by accumulating the \"weights of evidence\" incident it from the rules that fire. \nThe largest estimate is chosen as the initial class label decision. The probability \nestimates for the output classes can now be used for feedback purposes and further \nhigher level processing. \n\nThe rule-based classification system can be mapped into a 3 layer feed forward \narchitecture as shown in Fig. 2. The input layer contains a node for each attribute. \n\n\f448 \n\nGreenspan, Goodman, and Chellappa \n\nThe hidden layer contains a node for each rule and the output layer contains a \nnode for each class. Each rule (second layer node j) is connected to a class via the \nmultiplicative weight of evidence Wi. \n\nInputs \n\nRules \n\nClass \nProbability \nEstimates \n\nFigure 2: Rule-Based Network \n\n5 RESULTS \n\nIn previous results (Greenspan et aI, 1991) we have shown the capability of the \nproposed system to recognize successfully both artificial (\"texton\") and natural \ntextures. A classification rate of 95-99% was obtained for 2 and 3 class artificial \nimages. 90-98% was achieved for 2 and 3 class natural texture mosaics. In this work \nwe wish to demonstrate the advantage of utilizing the output probability maps in \nthe pattern recognition process. The probability maps are utilized for higher level \nanalysis such as a feedback for smoothing and the identification of unknown patterns \n(pattern \"discovery\"). An example of a 5 - class natural texture classification is \npresented in Fig. 3. The mosaic is comprised of grass, raffia, herring, wood and \nwool (center square) textures. The input mosaic is presented (top left), followed \nby the labeled output map (top right) and the corresponding probability maps for \na prelearned library of 6 textures (grass, raffia, wood, calf, herring and wool, left \nto right, top to bottom, respectively). The input poses a very difficult task which \nis challenging even to our own visual perception. Based on the probability maps \n(with white indicating strong probability) the very satisfying result of the labeled \noutput map is achieved. The 5 different regions have been identified and labeled \ncorrectly (in different shades of gray) with the boundaries between the regions very \nstrongly evident. A feedback based on the probability maps was used for smoothing \nover the label map, to achieve the result presented. It is worthwhile noting that \nthe probabilistic framework enables the analysis of both structural textures (such \nas the wood, raffia and herring) and unstructural textures (such as the grass and \nwool). \n\nFig. 4. demonstrates the generalization capability of our system to the identifica(cid:173)\ntion of an unknown class. In this task a presented pattern which is not part of the \nprelearned library, is to be recognized as such and labeled as an unknown area of \ninterest. This task is termed \"pattern discovery\" and its application is wide spread \nfrom identifying unexpected events to the presentation of areas-of-interest in scene \nexploratory studies. Learning the unknown is a difficult problem in which the prob(cid:173)\nability estimates prove to be valuable. In the presented example a 3 texture library \n\n\fA Framework for Probabilistic Pattern Recognition and Discovery \n\n449 \n\nwas learned, consisting of wood, raffia and grass textures. The input consists of \nwood, raffia and sand (top left). The output label map (top right) which is the \nresult of the analysis of the respective probability maps (bottom) exhibits the accu(cid:173)\nrate detection of the known raffia and wood textures, with the sand area labeled in \nblack as an unknown class. This conclusion was based on the corresponding prob(cid:173)\nability estimations which are zeroed out in this area for all the known classes. \\Ve \nhave thus successfuly analyzed the scene based on the existing source of knowledge. \n\nOur most recent results pretain to the application of the system to natural scenery \nanalysis. This is a most challanging task as it relates to real-world applications, an \nexample of which are NASA space exploratory goals. Initial simulation results are \npresented in Fig. 5. which presents a sand-rock scenerio. The training examples \nare presented, followed by two input images and their corresponding output label \nmaps, left to right, respectively. Here, white represents rock, gray represents sand \nand black regions are classified as unknown. The system copes successfully with \nthis challange. We can see that a distinction between the regions has been made \nand for a possible mission such as rock avoidence (landing purposes, navigation \netc.) reliable results were achieved. These initial results are very encouraging and \nindicate the robustness of the system to cope with difficult real-world cases. \n\nInput \n\nOutput -.. -..\"......---~.., \n\nProbability Maps \n\nwood \n\nwool \n\nsand \n\nFigure 3: Five class natural texture classification \n\n\f450 \n\nGreenspan, Goodman, and Chellappa \n\nInput \n\nOutput \n\nFigure 4: Identification of an unknown pattern \n\nOutput \n\nExample I \n\nTraining Set \n\nsand \n\nrock \n\nFigure 5: Natural scenery analysis \n\nExample 2 \n\n\fA Framework for Probabilistic Pattern Recognition and Discovery \n\n451 \n\n6 SUMMARY \n\nThe proposed learning scheme achieves a high percentage classification rate on both \nartificial and natural textures. The combined neural network and rule-based frame(cid:173)\nwork enables a probabilistic approach to pattern recognition. In this work we have \ndemonstrated the advantage of utilizing the output probability maps in the pattern \nrecognition process. Complicated patterns were analyzed accurately, with an exten(cid:173)\nsion to real-imagery applications. The generalization capability of the system to the \ndiscovery of unknown patterns was demonstrated. Future work includes research \ninto scale and rotation invariance capabilities of the presented framework. \n\nAcknowledgements \n\nThis work is funded in part by DARPA under the grant AFOSR-90-0199 and in part \nby the Army Research Office under the contract DAAL03-89-K-0126. Part of this \nwork was done at Jet Propulsion Laboratory. The advice and software support of \nthe image-analysis group there, especially that of Dr. Charlie Anderson, is greatly \nappreciated. \n\nReferences \n\nH. Greenspan, R. Goodman and R. Chellappa. \n(1991) Texture Analysis via \nUnsupervised and Supervised Learning. Proceedings of the 1991 International Joint \nConference on Neural Networks, Vol. 1:639-644. \n\nR. M. Goodman, C. Higgins, J. Miller and P. Smyth. (1992) Rule-Based Networks \nfor Classification and Probability Estimation. to appear in Neural Computation. \n\nP. Smyth and R. M. Goodman. (1991) Rule Induction using Information Theory. \nIn G. Piatetsky-Shapiro, W. Frawley (eds.), Knowledge Discovery in Databases, \n159-176. AAAI Press. \nJ. Malik and P. Perona. (1990) Preattentive texture discrimination with early \nvision mechanisms. Journal of Optical Society of America A, Vol. 7[5]:923-932. \n\nA. C. Bovik, M. Clark and W. S. Geisler. (1990) Multichannel Texture Anal(cid:173)\nysis Using Localized Spatial Filters. IEEE Transactions on Pattern Analysis and \nMachine Intelligence, 12(1):55-73. \n\nP.J. Burt and E. A. Adelson. (1983) The Laplacian Pyramid as a compact image \ncode. IEEE Trans. Commun.,COM-31:532-540. \n\nT. Kohonen. (1984) Self Organisation and Associative Memory. Springer-Verlag. \n\n\f", "award": [], "sourceid": 582, "authors": [{"given_name": "Hayit", "family_name": "Greenspan", "institution": null}, {"given_name": "Rodney", "family_name": "Goodman", "institution": null}, {"given_name": "Rama", "family_name": "Chellappa", "institution": null}]}