{"title": "Image Recognition in Context: Application to Microscopic Urinalysis", "book": "Advances in Neural Information Processing Systems", "page_first": 963, "page_last": 969, "abstract": null, "full_text": "Image Recognition in Context: Application to \n\nMicroscopic Urinalysis \n\nXuboSong* \n\nDepartment of Electrical and Computer Engineering \nOregon Graduate Institute of Science and Technology \n\nBeaverton, OR 97006 \nxubosong@ece.ogi.edu \n\nJoseph Sill \n\nDepartment of Computation and Neural Systems \n\nCalifornia Institute of Technology \n\nPasadena, CA 91125 \n\njoe@busy.work.caltech.edu \n\nYaser Abu-Mostafa \n\nHarvey Kasdan \n\nDepartment of Electrical Engineering \n\nInternational Remote Imaging Systems, Inc. \n\nCalifornia Institute of Technology \n\nChatsworth, CA 91311 \n\nPasadena, CA 91125 \n\nyase r@work.caltech.edu \n\nAbstract \n\nWe  propose a  new  and efficient technique for  incorporating contextual \ninformation into object classification. Most of the current techniques face \nthe problem of exponential computation cost.  In this paper, we propose a \nnew general framework that incorporates partial context at a linear cost. \nThis technique  is  applied  to  microscopic  urinalysis  image  recognition, \nresulting in a significant improvement of recognition rate over the context \nfree approach. This gain would have been impossible using conventional \ncontext incorporation techniques. \n\n1  BACKGROUND: RECOGNITION IN CONTEXT \n\nThere are a number of pattern recognition problem domains where the classification of an \nobject should be based on more than simply the appearance of the object itself.  In remote \nsensing image classification, where each pixel is part of ground cover, a pixel is more like(cid:173)\nly  to  be a glacier if it is  in  a mountainous area, than  if surrounded by  pixels of residential \nareas.  In text analysis, one can expect to find certain letters occurring regularly in  particu(cid:173)\nlar arrangement with  other letters(qu, ee,est, tion,  etc.).  The information conveyed by  the \naccompanying entities is referred to as contextual information.  Human experts apply con(cid:173)\ntextual information in their decision making [2][ 6].  It makes sense to design techniques and \nalgorithms to make computers aggregate and utilize a more complete set of information in \ntheir decision making the way human experts do.  In  pattern recognition systems, however, \n\n*Author for correspondence \n\n\f964 \n\nX  B.  Song, J  Sill,  Y.  Abu-Mostafa and H.  Kasdan \n\nthe primary (and often only) source of information used to identify an object is the set of \nmeasurements, or features, associated with the object itself.  Augmenting this information \nby incorporating context into the classification process can yield significant benefits. \n\ni  =  1, ... N.  With  each  object  we  associate  a \nConsider  a  set  of  N  objects  Ti , \nclass  label  Ci  that  is  a  member  of  a  label  set  n  =  {1 , ... , D} .  Each  object  Ti \nis  characterized  by  a  set  of  measurements  Xi  E  R P,  which  we  call  a  feature  vec(cid:173)\ntor.  Many  techniques  [1][2][4J[6}  incorporate  context  by  conditioning  the  posterior \nprobability  of objects'  identities  on  the joint features  of all  accompanying objects.  i.e .\u2022 \nP(Cl, C2,\u00b7\u00b7\u00b7 , cNlxl , . . . , XN). and then maximizing it with respectto Cl, C2, . .. , CN . It can \nbe  shown  thatp(cl,c2, . . . ,cNlxl, . . . ,xN)  ex  p(cllxl) ... p(CNlxN)  (~ci\"\"'(N\\ given \ncertain reasonable assumptions. \n\n1  \u2022.. p  CN \n\np \n\nOnce  the  context-free  posterior  probabilities  p( Ci IXi)  are  known.  e.g. \nthrough  the \nuse  of  a  standard  machine  learning  model  such  as  a  neural  network,  computing \nP(Cl, ... ,CNlxl, . . . ,XN)  for  all  possible  Cl, ... ,CN  would  entail  (2N + 1)DN  multi(cid:173)\nplications. and finding the maximum has complexity of DN. which is intractable for large \nNand D.  [2J \n\nAnother problem with this formulation is the estimation of the high dimensional joint dis(cid:173)\ntribution p( Cl, ... , CN),  which is ill-posed and data hungry. \n\nOne  way  of dealing  with  these  problems  is  to  limit context to  local  regions.  With  this \napproach, only the pixels in a close neighborhood. or letters immediately adjacent are con(cid:173)\nsidered [4][6][7J.  Such techniques may be ignoring useful information, and will  not apply \nto  situations  where context doesn't have such locality,  as in  the case of microscopic uri(cid:173)\nnalysis image recognition.  Another way is to simplify the problem using specific domain \nknowledge [1], but this is only possible in certain domains. \n\nThese difficulties motivate the efficient incorporation of partial context as a general frame(cid:173)\nwork, formulated in section 2.  In section 3, we discuss microscopic urinalysis image recog(cid:173)\nnition. and address the importance of using context for this application.  Also in  section 3, \ntechniques are proposed to identify relevant context. Empirical results are shown in section \n4. followed by discussions in section 5. \n\n2  FORMULATION FOR INCORPORATION OF PARTIAL \n\nCONTEXT \n\nTo  avoid  the exponential computational cost of using  the  identities  of all  accompanying \nobjects directly as context, we use \"partial context\". denoted by A. It is called \"partial\" be(cid:173)\ncause it is derived from the class labels. as opposed to consisting of an explicit labelling of \nall objects. The physical definition of A depends on the problem at hand.  In our application. \nA represents the presence or absence of certain classes.  Then the posterior probability of \nan object Ti  having class label  Ci  conditioned on its feature vector and the relevant context \nA is \n\np(XiICi, A)P(Ci ; A) \n\nP(Xi ; A) \n\nWe  assume  that the  feature  distribution  of an  object depends only  on  its  own  class.  i.e., \np(xilci, A)  =  p(xi lci) .  This assumption  is  roughly  true  for  most real  world  problems. \nThen. \n\n\fImage Recognition in Context: Application to Microscopic Urinalysis \n\n965 \n\n(  .1  .  A) - p(xilci)p(Ci; A)  _ \n\npC~Xt, \n\n.)p(ciI A ) p(A)p(Xi) \np(Ci)  P(Xi; A) \n\n-\n()(  p(cilxi)  ()  = p(cilxi)P(Ci, A) \n\np(xijJ~IIA) \n\n(  .1 \n\n-pCtXt \n\nP  Ci \n\nwhere p(Ci, A)  = p~(~j~)  is  called the context ratio,  through which context plays its role. \nThe context-sensitive posterior probability p( Ci lXi, A)  is obtained through the context-free \nposterior probability p(cilxi) modified by the context ratio P(Ci, A) . \n\nThe partial-context maximum likelihood decision rule chooses class label Ci  for element i \nsuch that \n\nA systematic approach to identify relevant context A is addressed in section 3.3. \n\nCj \n\nCi  =  argmaxp(cilxi, A) \n\n(I) \n\nThe  partial-context  approach  treats  each  element  in  a  set  individually,  but  with  addi(cid:173)\ntional  information  from  the  context-bearing factor  A.  Once  p(cilxi)  are  known  for  all \ni  =  1, ... , N,  and  the  context A  is  obtained,  to  maximize p(cilxi, A)  from  D  possible \nvalues that Ci can take on and for all i, the total  number of multiplications is 2N, and the \ncomplexity for finding the maximum is N D. Both are linear in N. The density estimation \npart is also trivial since it is very easy to estimate p(cIA). \n\n3  MICROSCOPIC URINALYSIS \n\n3.1 \n\nINTRODUCTION \n\nUrine is  one  of the most complex body fluid  specimens:  it potentially contains about 60 \nmeaningful types of elements.  Microscopic urinalysis detects the presence of elements that \noften provide early diagnostic information concerning dysfunction, infection, or inflamma(cid:173)\ntion of the kidneys and urinary tract. Thus this non-invasive technique can be of great value \nin clinical case management. Traditional manual microscopic analysis relies on human op(cid:173)\nerators who read the samples visually and identify them, and therefore is time-consuming, \nlabor-intensive and difficult to standardize. Automated microscopy of all specimens is more \npractical than manual microscopy, because it eliminates variation among different technol(cid:173)\nogists.  This  variation  becomes  more  pronounced  when  the  same  technologist examines \nincreasing numbers of specimens.  Also, it  is less labor-intensive and thus less costly than \nmanual  microscopy.  It also  provides more consistent and  accurate results.  An  automat(cid:173)\ned  urinalysis  system  workstation  (The  Y ellowI RI ST M,  International  Remote  Imaging \nSystems,  Inc.)  has  been  introduced in  numerous clinical  laboratories for  automated mi(cid:173)\ncroscopy.  Urine samples are processed and examined at  lOOx  (low power field) and 400x \nmagnifications (high power field) with bright-field illumination. The Y ellowI RI ST M  au(cid:173)\ntomated system collects video images of formed analytes in a stream of un centrifuged urine \npassing an optical assembly.  Each image has one analyte in  it.  These images are given to a \ncomputer algorithm for automatic identification of analytes. \n\nContext is  rich  in  urinalysis and plays a crucial role  in  analyte classification.  Some com(cid:173)\nbinations of analytes  are  more  likely  than  others.  For instance,  the  presence of bacteria \nindicates the presence of white blood cells, since bacteria tend to cause infection and thus \ntrigger the production of more white blood cells.  If amorphous crystals show up, they  tend \nto show up in bunches and in all sizes.  Therefore, if there are amorphous crystallook-alikes \nin  various sizes,  it is quite possible that they are amorphous crystals.  Squamous epithelial \ncells can appear both flat or rolled up.  If squamous epithelial cells in one form are detected, \n\n\f966 \n\nX  B.  Song,  J  Sill,  Y.  Abu-Mostafa and H.  Kasdan \n\nTable  I:  Features extracted from  urine anylates images \n\nreature number \n\nreature desc:ription \n\n( \n2 \n\n4 \n\n9 \n10 \nII \n12 \n13 \n14 \nIS \n16 \n\ntht:  m~an or hluc distribution \nthe  mean of gn...-cn  dislrihulmn \n15th  paccnlile of \u00a3ray level hislo\u00a3ram \n85 th  percenlile of gray level  hislogmm \nlh~ standard devia.tion  \\11' gray level  intensity \nenergy of the (.aplacian lransl\\)rmalion of grey level  image \n\nthen  it is  likely  that  there  are squamous epithelial  cells  in  the  other form.  Utilizing  such \ncontext is crucial for classification accuracy. \n\nThe classes we are looking at are bacteria, calcium oxalate crystals, red blood cells, white \nblood cells,  budding yeast,  amorphous crystals, uric  acid crystals,  and  artifacts.  The task \nof automated  microscopic  urinalysis  is,  given  a  urine  specimen  that  consists  of up  to  a \nfew  hundred  images  of analytes,  to  classify  each  analyte  into  one  of these  classes.  The \nautomated urinalysis  system  we  developed consists  of three steps:  image  processing and \nfeature  extraction,  learning and  pattern  recognition,  and  context incorporation.  Figure  1 \nshows some example analyte images. Table 1 gives a list of features extracted from analyte \nimages. 1 \n\n3.2  CONTEXT-FREE CLASSIFICATION \n\nThe features are fed into a nonlinear feed-forward neural network with  16 inputs,  15 hidden \nunits  with sigmoid transfer functions,  and  8  sigmoid output units.  A cross-entropy error \nfunction  is  used  in  order to  give the  output a probability  interpretation.  Denote  the  input \nfeature  vector  as  x,  the  network  outputs  a  D  dimensional  vector (D  =  8  in  our  case) \np = {p(dlx)} , d = 1, ... , D, where p(dlx) is \n\np{dlx)  =  Prob( an  analyte belongs to class dl  feature x) \n\nThe decision made at this stage is \n\nd{x)  =  argmax p(dlx) \n\nd \n\n3.3 \n\nIDENTIFICATION OF RELEVANT PARTIAL CONTEXT \n\nNot all classes are relevant in  terms of carrying contextual  information.  We  propose three \ncriteria based  on  which  we  can  systematicalIy investigate the  relevance  of the class pres(cid:173)\nence. To use these criteria, we need to know the folIowing distributions:  the class prior dis(cid:173)\ntribution p(c)  for c = 1, ... ,D; the conditional class distribution p{cIAd) for c = 1, ... ,D \n1 >'1  and >'2  are respectively  the  larger and the smaller eigenvalues of the second  moment  matrix \n\nof an  image. \n\n\fImage Recognition in Context: Application to Microscopic Urinalysis \n\n967 \n\nand d = 1, . .. ,D; and the class presence prior distribution p(Ad) for d = 1, . . . , D.  Ad  is \na binary random variable indicating the presence of class d.  Ad  =  1 if class d is present, \nand Ad  =  0 otherwise.  All these distributions can be easily estimated from the database. \n\n1)) \n\nThe  first  criterion  is  the  correlation  coefficient  between  the  presence  of any  two  class(cid:173)\nes;  the second one is  the classical mutual information I(e; Ad)  between the presence of a \nclass Ad  and  the class probability pee),  where I(e; Ad)  is defined  as  I(e; Ad)  =  H(e)  -\nH(eIAd)  where H(e)  = 2:~1 p(e = i)ln(p(e = i)) is the entropy of the class priors and \nH(eIAd)  = P(Ad = I)H(eIAd = 1)+P(Ad = O)H(eIAd  = 0) is the conditional entropy \nof e conditioned on  Ad.  The third criterion  is  what we call the expected relative  entropy \nD(eIIAd)  between the presence ofa class Ad  and the labeling probability pee) , which we \ndefine as D(eIIAd)  = P(Ad = I)D(p(e)llp(eIAd  = 1)) + P(Ad = O)D(p(e)llp(eIAd  = \n2:~lP(e =  ilAd  =  l)ln(p(c;/l~t)=l))  and \n0))  where  D(p(e)llp(eIAd \nD(p(e)llp(eIAd = 0)) = 2:~1 p(e = ilAd =  O)ln(p(C;/l~t)=O)) \nAccording to the first criterion, one type of analyte is considered relevant to another if the \nabsolute  value of their correlation coefficient is beyond a certain threshold.  It shows that \nuric  acid crystals, budding yeast and calcium oxalate crystals are not relevant to any other \ntypes even by  a generous threshold of 0.10.  Similarly, the bigger the  mutual  information \nbetween the presence of a class and the class distribution,  the more relevant this class  is. \nRanking the  analyte  types  in  terms of I(e; Ad)  in  a  descending manner gives  rise  to  the \nfollowing  list:  bacteria,  amorphous crystals,  red  blood cells,  white  blood cells,  uric  acid \ncrystals, budding yeast and calcium oxalate crystals.  Once again, ranking the analyte types \nin terms of D(eIIAd)  in  a descending manner gives rise to the following list:  bacteria, red \nblood cells, amorphous crystals, white blood cells, calcium oxalate crystals, budding yeast \nand uric acid crystals.  All three criteria lead to similar conclusions regarding the relevance \nof class presence - bacteria, red blood cells, amorphous crystals, and white blood cells are \nrelevant, while calcium oxalate crystals, budding yeast and uric acid crystals are not.  (Baed \non prior knowledge, we discard artifacts from the outset as an irrelevant class.) \n\n3.4  ALGORITHM FOR INCORPORATING PARTIAL CONTEXT \n\nOnce the M  relevant classes are identified, the following algorithm  is  used to incorporate \npartial context. \n\nStep 0 Estimate the priors p(eIAd) and pee), for e E  {I, 2, .. . , D} and d E  {I, 2, ... , D}. \n\nStep 1 For a given Xi, compute p(edxi) for ei  =  1,2, . .. , Dusing whichever base machine \nlearning model is preferred ( in our case, a neural network). \n\nStep 2 Let the M  relevant classes be R 1 , ..\u2022 , RM.  According to  the  no-context p( ei IXi) \nand  certain  criteria for  detecting  the  presence  or absence  of all  the  relevant  classes,  get \nARI , \u00b7 \u00b7\u00b7 ,ARM' \nStep 3 Letp(ei lXi , Ao)  =  p(eilxi), where Ao  is the null element.  Incorporate context from \neach relevant class sequentially, i.e., for m  =  1 to  M, iteratively compute \n\np(eilxi; Ao, .. . , ARm_I ' ARTn)  =  p(eilxi' Ao,.\u00b7 . , ARTn_J \n\np(ei IARTn)p(AR\"J \n\npee) \n\nStep 4 Recompute A RI , . . . ,ARM  based on the new class labellings.  Return to step 3 and \nrepeat until algorithm converges.2 \n\n2Hence, the algorithm has an E-M  flavor,  in that it goes back and forth between  finding  the  most \n\n\f968 \n\nX  B.  Song,  J.  Sill,  Y.  Abu-Mostafa and H  Kasdan \n\namorphous crystals \n\nartifacts \n\ncalcium oxalate crystals \n\nhyaline casts \n\nFigure  I: Example of some of the analyte images. \n\n5  Label \n\nStep \np(cilxi, ARI'\u00b7\u00b7 \u00b7 ' ARM)' i.e., Ci  = argmaxp(ciIXi, AR1 , ... , ARM) for i  = 1, ... , N. \n\ncontext-contammg \n\nfinal \n\nthe \n\nthe \n\nobjects \n\naccording \n\nto \n\nThis  algorithm  is  invariant  with  respect  to  the  ordering  of  the  M  relevant  classes  in \n(Ai, ... , AM). The proof is omitted here. \n\nCi \n\n4  RESULTS \n\nThe algorithm using partia.1 context was tested on a database of 83 urine specimens, contain(cid:173)\ning a total of 20,276 analyte images.  Four classes are considered relevant according to  the \ncriteria described in section 3.3: bacteria, red blood cells, white blood cells and amorphous \ncrystals. We measure two types of error: analyte-by-analyte error, and specimen diagnostic \nerror.  The average analyte-by-analyte error is  reduced from  44.48% before using context \nto 36.66% after, resulting a relative error reduction of 17.6% (Table 2).  The diagnosis for a \nspecimen is either normal or abnormal. Tables 3 and 4 compare the diagnostic performance \nwith and  without using context, and  Table 5  lists  the  relative changes.  We  can  see  using \ncontext significantly increases correct diagnosis for both normal and abnormal specimens, \nand reduces both false positives and false  negatives. \n\naverage element-by-element error \n\n44.48 % \n\n36.66 % \n\nwithout context  with context \n\nTable 2:  Comparison of using and not using contextual information for analyte-by-analyte \nerror. \n\nprobable class labels given the context and determining the context given the class labels. \n\n\fImage Recognition in Context:  Application to Microscopic Urinalysis \n\n969 \n\nestimated normal \n\nestimated abnormal \n\ntruly normal \ntruly abnormal \n\n40.96 % \n19.28  % \n\n7.23 % \n32.53 % \n\nTable 3:  Diagnostic confusion matrix not using context \n\nestimated normal \n\nestimated abnormal \n\ntruly normal \ntruly abnormal \n\n42.17 % \n16.87 % \n\n6.02% \n34.94 % \n\nTable 4:  Diagnostic confusion matrix using context \n\nestimated normal \n\nestimated abnormal \n\ntruly normal \ntruly abnormal \n\n+2.95 % \n- 12.50 % \n\n-16.73 % \n+7.41  % \n\nTable 5:  Relative accuracy improvement (diagonal elements) and error reduction (off diag(cid:173)\nonal elements) in  the diagnostic confusion matrix by using context. \n\n5  CONCLUSIONS \nWe  proposed  a  novel  framework  that  can  incorporate  context  in  a  simple  and  efficien(cid:173)\nt manner, avoiding exponential computation and high dimensional density estimation.  The \napplication  of the  partial context technique to  microscopic  urinalysis  image  recognition \ndemonstrated the efficacy of the algorithm.  This algorithm is  not domain dependent, thus \ncan be readily generalized to other pattern recognition areas. \n\nACKNOWLEDGEMENTS \n\nThe authors would like to thank Alexander Nicholson, Malik Magdon-Ismail, Amir Atiya \nat the Caltech Learning Systems Group for helpful discussions. \n\nReferences \n\n[I) Song, X.B . &  SilU. &  Abu-Mostafa &  Harvey  Kasdan,  (1997) \"Incorporating Contextual  Infor(cid:173)\nmation in White Blood Cell Identification\", In M. Jordan, MJ. Kearns and S.A. Solla (eds.), Advances \nin Neural Information Processing Systems  7,1997, pp.  950-956.  Cambridge, MA: MIT Press. \n\n[2] Song, Xubo (1999) \"Contextual Pattern Recognition with Application to Biomedical  Image Iden(cid:173)\ntification\", Ph.D. Thesis, California Institute of Science and Technology. \n\n[3)  Boehringer-Mannheim-Corporation,  Urinalysis  Today,  Boehringer-Mannheim-Corporation, \n1991. \n\n[4]  Kittler, J..\"Relaxation  labelling\",  Pattern  Recognition  Theory  and  Applications,  1987,  pp.  99-\n108., Pierre A.  Devijver and Josef Kittler, Editors, Springer-Verlag. \n\n[5]  Kittler,  J.  &  Illingworth,  J.,  \"Relaxation  Labelling  Algorithms  - A Review\",  Image and  Vision \nComputing,  1985, vol.  3, pp.  206-216. \n\n[6]  Toussaint, G., \"The Use of Context in  Pattern  Recognition\",  Pattern Recognition,  1978, vol.  10, \npp.  189-204. \n\n[7]  Swain, P.  &  Vardeman, S.  & Tilton, J., \"Contextual Classification of Multispectral  Image  Data\", \nPattern Recognition,  1981, Vol.  13,  No.6, pp.  429-441. \n\n\f", "award": [], "sourceid": 1675, "authors": [{"given_name": "Xubo", "family_name": "Song", "institution": null}, {"given_name": "Joseph", "family_name": "Sill", "institution": null}, {"given_name": "Yaser", "family_name": "Abu-Mostafa", "institution": null}, {"given_name": "Harvey", "family_name": "Kasdan", "institution": null}]}