{"title": "Semiparametric Approach to Multichannel Blind Deconvolution of Nonminimum Phase Systems", "book": "Advances in Neural Information Processing Systems", "page_first": 363, "page_last": 369, "abstract": null, "full_text": "Semiparametric Approach to Multichannel \n\nBlind Deconvolution \n\nof Nonminimum Phase Systems \n\nL.-Q. Zhang, S. Amari and A. Cichocki \n\nBrain-style Information Systems Research Group, BSI \n\nThe Institute of Physical and Chemical Research \n\nWako shi, Saitama 351-0198, JAPAN \n\nzha@open.brain.riken.go.jp \n\n{amari,cia }@brain.riken.go.jp \n\nAbstract \n\nIn  this  paper we discuss  the  semi parametric statistical  model  for  blind \ndeconvolution.  First we  introduce a Lie Group to  the  manifold of non(cid:173)\ncausal  FIR  filters.  Then  blind deconvolution problem is  formulated  in \nthe  framework of a  semiparametric  model,  and  a  family  of estimating \nfunctions  is  derived for  blind  deconvolution.  A  natural gradient  learn(cid:173)\ning algorithm is developed for training noncausal filters.  Stability of the \nnatural gradient algorithm is also analyzed in this framework. \n\n1 \n\nIntroduction \n\nRecently  blind separation/deconvolution has  been  recognized as  an  increasing  important \nresearch  area due  to  its  rapidly  growing  applications  in  various  fields,  such  as  telecom(cid:173)\nmunication  systems,  image enhancement and biomedical signal  processing.  Refer  to  re(cid:173)\nview  papers  [7]  and  [13]  for  details.  A  semi parametric statistical  model  treats  a  family \nof probability distributions  specified by  a finite-dimensional  parameter of interest and an \ninfinite-dimensional nuisance  parameter [12].  Amari and Kumon  [10]  have  proposed an \napproach to  semiparametric statistical  models in  terms of estimating functions  and  eluci(cid:173)\ndated their geometric structures and efficiencies by information geometry [1].  Blind source \nseparation can be formulated in the framework of semi parametric statistical models.  Amari \nand Cardoso [5]  applied information geometry of estimating functions to blind source sep(cid:173)\naration  and  derived  an  admissible  class  of estimating  functions  which  includes  efficient \nestimators.  They  showed that the  manifold of mixtures  is  m-curvature free,  so  that we \ncan design algorithms of blind separation without taking much care of misspecification of \nsource probability functions. \n\nThe theory of estimating functions has also been applied to the case of instantaneous mix(cid:173)\ntures, where independent source signals have unknown temporal correlations [3].  It is also \napplied to derive efficiency and superefficiency of demixing learning algorithms [4]. \n\nMost of these  theories  treat only  blind  source separation of instantaneous mixtures.  It is \nonly recently that the natural gradient approach has been proposed for multichannel blind \n\n\f364 \n\nL.-Q.  Zhang,  S.  Amari and A.  Cichocki \n\ndeconvolution  [8],  [18].  The present paper extends  the  geometrical theory of estimating \nfunctions to the semiparametric model of multichannel blind deconvolution. For the limited \nspace, the detailed derivations and proofs are left to a full paper. \n\n2  Blind Deconvolution Problem \n\nIn  this  paper,  as  a  convolutive  mixing  model,  we  consider a  multichannel  linear  time(cid:173)\ninvariant (LTI) systems, with no poles on the unit circle, of the form \n\n00 \n\nx(k)  =  L  Hps(k - p), \n\n(1) \n\np=-oo \n\nwhere  s(k)  is  an  n-dimensional  vector  of  source  signals  which  are  spatially  mutu(cid:173)\nally  independent  and  temporarily  identically  independently  distributed,  and  x(k)  is  an \nn-dimensional sensor vector at time k, k =  1,2, . . '.  We denote the unknown mixing filter \nby H(z)  =  2::-00 Hpz-p.  The goal of multichannel blind deconvolution is  to  retrieve \nsource signals s(k)  only  using  sensor signals x(k), k  =  1,2\"\", and certain knowledge \nof the source signal distributions and statistics.  We carry out blind deconvolution by using \nanother multichannel LTI system of the form \n\ny(k)  =  W(z)x(k), \n\n(2) \n\nwhere  W(z)  =  2:~=-N Wpz-P,  N  is  the  length  of  FIR  filter  W(z),  y(k) \n[Yl (k), ... ,Yn(k)V  is  an  n-dimensional vector of the outputs,  which is  used to  estimate \nthe source signals. \nWhen  we  apply W(z)  to  the  sensor signal  x(k),  the  global  transfer function  from  s(k) \nto  y(k)  is  defined  by  G(z)  = W(z)H(z).  The goal  of the  blind deconvolution  task  is \nto  find W(z)  such  that  G(z)  =  PAD(z),  where  P  E  R nxn  is  a  permutation matrix, \nD(z) = diag{z-d 1 ,  \u2022\u2022\u2022 ,z- dn }, and A  E R n x n  is a nonsingular diagonal scaling matrix. \n\n3  Lie Group on M (N, N) \n\nIn  this section, we  introduce a Lie group to the manifold of noncausal FIR filters.  The Lie \ngroup operations playa crucial role in the following discussion. The set of all the noncausal \nFIR filters W (z) of length N, having the constraint that W  is nonsingular, is denoted by \n\nM(N,N) =  {W(Z) I W(z) = .tN W.z- \u00b7 ,  det(W) # o}, \n\n(3) \n\n(4) \n\nwhere W  is an N  x  N  block matrix, \n\n...  W_N+ll \n...  W - N+2 \n. \n. \n\n. \n. \nWo \n\nM(N, N) is a manifold of dimension n 2 (2N + 1). In general, multiplication of two filters \nin M(N, N) will enlarge the filter length and the result does belong to M(N, N) anymore. \nThis makes it difficult to introduce the Riemannian structure to the manifold of noncausal \nFIR  filters.  In  order to  explore  possible  geometrical  structures of M(N, N)  which  will \nlead to effective learning algorithms for W (z) , we define algebraic operations of filters in \nthe  Lie group framework.  First,  we  introduce a  novel  filter  decomposition of noncausal \nfilters  in M  (N, N) into a product of two one-sided FIR filters  [19], which is  illustrated in \nFig.  1. \n\n\fBlind Deconvolution of Nonminimum Phase Systems \n\n365 \n\nUnknown \n\n:s(k) \n\nn : \n\nx(k) \n\nH(z) \n\ny(k) \nn  ~  n \n\nR(Z\") \n\nL(z) \n\nu(k) \n\ni \n\nMixing model \n\ni \n\nDemixing model \n\nFigure 1:  Illustration of decomposition of noncausal filters in M (N, N) \n\nLemma 1  [19]  If the matrix W  is nonsingular, any noncausalfilter W(z) in  M(N,N) \nhas the decomposition W(z)  =  R(z)L(z-l), where R(z)  =  L::=o Rpz-P,  L(Z-l) = \nL::=o LpzP are one-sided FIR filters. \nIn the manifold M(N, N), Lie operations, multiplication * and inverse t, are defined \nas follows:  For B(z),  C(z)  E  M(N, N), \n\nB(z) * C(z) =  [B(z)C(z)]N'  Bt(z) =  Lt(Z-l)Rt(z), \n\n(5) \nwhere [B(Z)]N  is the truncating operator that any terms with orders higher than N  in the \npolynomial B (z) are truncated, and the inverse of one-side FIR filters is recurrently defined \nby ~ = RO l , at = - L::=l Rt_qRqROl , p  = 1,'\"  ,N. Refer to [18] for the detailed \nderivation. With these operations, both B(z) * C(z) and Bt (z) still remain in the manifold \nM  (N, N). It is easy to verify that the manifold M  (N, N) with the above operations forms \na Lie Group. The identity element is E(z) =  I. \n\n4  Semiparametric Approach to Blind Deconvolution \n\nWe  first introduce the basic theory of semiparametric models, and formulate blind decon(cid:173)\nvolution problem in the framework of the semiparametric models. \n\n4.1  Semiparametric model \n\nConsider a  general  statistical  model  {p( Xj 6, en, where  x  is  a  random  variable  whose \nprobability density function is specified by two parameters, 6  and e, 6  being the param(cid:173)\neter of interest, and e being the nuisance parameter.  When the nuisance parameter is of \ninfinite  dimensions  or of functional  degrees  of freedom,  the  statistical  model  is  called \na  semiparametric  model  [12].  The gradient vectors of the  log  likelihood  u(Zj 6, e)  = \n81ogp(z;6.e)  v(z, 6  ~) - 81ogp(z;6,e)  are called the score functions of the parameter \nof interest or shortly 6-score and the nuisance score or shortly e -score, respectively. \nnuisance parameters at the same time, since the nuisance parameter e is of infinite degrees \n\nIn the semiparametric model, it is difficult to estimate both the parameters of interest and \n\n,,'it  -\n\nof freedom.  The semiparametric approach suggests  to  use  an estimating function  to es(cid:173)\ntimate the parameters of interest,  regardless of the  nuisance parameters.  The estimating \n\nfunction is a vector function z(x, 6), independent of nuisance parameters e, satisfying the \n\n8( \n\n, \n\n80 \n\n' \n\nfollowing conditions \n\n1)  Eo,dz(x,6)] =  0, \n2)  det(lC)  i= 0,  where IC  =  Eo,d \n\n8z(x,6) \n]. \n\n88 \n\n(6) \n\n(7) \n\n\f366 \n\nL.-Q.  Zhang,  S.  Amari and A.  Cichocki \n\n3)  Eo ,dz(x, 8)zT (x, 8)]  < 00, \n\n(8) \n\nfor  all  8  and e.  Generally  speaking,  it is  difficult to  find  an  estimating function.  Amari \nand Kawanabe [9]  studied the information geometry of estimating functions and provided \na novel approach to find  all  the estimating functions.  In this paper, we follow the approach \nto find a family of estimating functions for bind deconvolution. \n\n4.2  Semiparametric Formulation for Blind Deconvolution \n\nNow we  tum to formulate the blind deconvolution problem in the framework of semi para(cid:173)\nmetric  models.  From the  statistical  point of view,  the  blind deconvolution problem is  to \nestimate  H(z)  or H- 1(z)  from  the  observed data VL  = {x(k), k  = 1, 2\", .}.  The es(cid:173)\ntimate  includes  two  unknowns:  One  is  the  mixing  filter  H(z)  which  is  the  parameter of \ninterest, and the other is the probability density function p(s) of sources, which is the nui(cid:173)\nsance parameter in  the present case.  FOIf blind deconvolution problem, we usually assume \nthat source signals are  zero-mean, E[sil'  =  0,  for i  =  1\"\", n.  In addition,  we generally \nimpose constraints on the recovered signals to remove the indeterminacy, \n\n(9) \nA typical example of the constraint is ki ( Si)  =  sf -1. Since the source signals are spatially \nmutually independent and temporally iid, the pdfr(s) can be factorized into a product form \nr(s)  =  TI~l r(si). The purpose of this paper is  to  find  a family  of estimating  functions \nfor blind deconvolution.  Remarkable progress has been made recently in the theory of the \nsemiparametric approach  [9],[12].  It  has  been  shown  that the  efficient score  itself is  an \nestimating function for blind separation. \n\n5  Estimating Functions \n\nIn  this  section,  we  give  an  explicit form of the  score  function  matrix of interest and  the \nnuisance tangent space, by using a local nonholonomic reparameterization. We then derive \na family of estimating functions from it. \n\n5.1  Score function matrix and its representation \n\nSince the mixing model is a matrix filter, we write an estimating function in the same matrix \nfilter format \n\nF(x;H(z))  =  L  Fp(x;H(z))z-P, \n\nN \n\n(10) \n\np= -N \n\nwhere F p(x; H(z)) are n  x n-matrices.  In orderto derive the explicit form ofthe H-score, \nwe reparameterize the filter in a small neighborhood of H (z) by using a new variable matrix \nfilter  as  H(z) * (I - X(z)),  where 1 is  the  identity  element of the  manifold  M(N, N). \nThe variation X(z) represents a local coordinate system at the neighborhoodNH  of H(z) \non  the  manifold  M(N, N).  The  variation  dH(z)  of H(z)  is  represented  as  dH(z)  = \n-H(z) * dX(z). Letting W(z) =  Ht(z), we have \n\ndX(z)  =  dW(z) * wt(z) , \n\n(11 ) \n\nwhich is  a  nonholonomic differential  variable  [6]  since  (11) is  not integrable.  With  this \nrepresentation of the parameters, we can obtain learning algorithms having the equivariant \nproperty [14] since the deviation dX(z) is independent of a specific H(z). The relative or \nthe natural gradient of a cost function on the  manifold can be automatically derived from \nthis representation [21,  [14], [18]. \n\n\fBlind Deconvolution of Nonminimum Phase Systems \n\n367 \n\n{p(x;e,;)} \n\n{p(x;9,e)} \n\nFigure 2:  Illustration of orthogonal decomposition of score functions \n\nThe derivative of any cost function  l(H(z))  with  respect  to  a  noncausal  filter  X(z)  -\nE:==-N Xpz-P is defined by \n\n8l(H(z\u00bb  _  L  8l(H(z\u00bb  z-p \n\nN \n\naX(z) \n\np==-N  axp \n\nNow we can easily calculate the score function matrix of non causal filter X(z), \n\nalogp(XiH(z),r)  _ \n-\n\naX(z) \n\nN \n'\"'  ()  T(k _  )  -p \n, \nL.J  lP  Y Y \np=-N \n\nP z \n\nwhere lP(y) = ('Pi(Yi),\"', 'Pn(Yn\u00bbT, 'Pi(Yi) = dlO~;:(II/). and y = Ht(z)x. \n\n(12) \n\n(13) \n\nS.2  Efficient scores \n\nThe efficient scores, denoted by UE(s; H(z), r),  can be obtained by projecting the score \nfunction to the space orthogonal to the nuisance tangent space TJ'{z},r' which is illustrated \nin figure 2.  In this section, we give an explicit form of the efficient scores for blind decon(cid:173)\nvolution. \n\nLemma 2  [5}  The tangent nuisance space TJ'{z),r  is a linear space spanned by the nui(cid:173)\nsance score junctions, denoted by TJ'{z),r  =  {E:=I CiOi(Si)} , where Ci  are  coefficients, \nand ai(si) are arbitrary junctions, satisfying the/ollowing conditions \nE[ai(si)2] < 00,  E[sai(si)] = 0,  E[k(si)ai(si\u00bb)  = O. \n\n(14) \n\nWe rewrite the score function (13) into the form U(s; H(z), r) =  E!-N Upz-P, where \nUp  =  (cp(si(k\u00bbsj(k - P\u00bbnxn. \nLemma 3  The  off-diagonal elements  UO,ij(S; H(z), r),  i  =/:  j, and the  delay  elements \nUp,ij(S; H(z), r), P  =/:  0,  0/ the  score junctions are orthogonal to  the  nuisance tangent \nspace TJ(z),r' \n\nLemma 4  The projection 0/ UO,ii  to  the space orthogonal to  the nuisance tangent space \nTJ(z),r  is o/the/orm W(Si)  =  Co + CISi + C2k(Si),  where Cj are any constants. \n\n\f368 \n\nL.-Q.  Zhang,  S.  Amari and A.  Cichocki \n\nIn  summary we have the following theorem \nTheorem 1  The efficient score, UE(s; H(z), r)  = L::=-N U: z-P, is given by \n\nU: \n\n<p(s)sT(k - p),  for p:f. 0; \n\nU~ \n\nfor off diagonal elements, \nfor diagonal elements. \n\n(15) \n\n(16) \n\nFor the  instantaneous mixture case,  it has been proven  [9]  that the semiparametric model \nfor blind separation is  information m-curvature free.  This is  also true in the  multichannel \nblind deconvolution case.  As a result, the efficient score function is an estimating function \nfor blind deconvolution. Using this result, we easily derive a family of estimating functions \nfor blind deconvolution \n\nF(x(k); W(z))  =  L  c.p(y(k))y(k - pf z-P - I , \n\nN \n\n(17) \n\nwhere y(k) = W(z)x(k), and <p  is a given function vector. The estimating function is the \nefficient score function, when Co  = Cl  = 0,  C2  = 1 and ki(Si) = c.pi(sdsi  - 1. \n\np=-N \n\n6  Natural Gradient Learning and its Stability \n\nOrdinary stochastic gradient methods for parameterized systems suffer from slow conver(cid:173)\ngence due to  the statistical correlations of the processes signals.  While quasi-Newton and \nrelated methods can be used to improve convergence, they also suffer from the mass com(cid:173)\nputation and numerical instability, as well as  local convergence. \n\nThe natural gradient approach was  developed to  overcome the  drawback of the ordinary \ngradient algorithm in the Riemannian spaces [2, 8,  15].  It has been proven that the natural \ngradient algorithm is an efficient algorithm in blind separation and blind deconvolution [2]. \n\nThe efficient score function  ( the estimating function)  gives an efficient search direction \nfor updating filter X(z) . Therefore, the updating rule for X(z) is described by \n\nXk+l(Z)  =  Xk(z) -1]F(x(k), Wk(Z)), \n\n(18) \n\nwhere 1]  is  a learning rate.  Since the new parameterization X(z)  is defined by a nonholo(cid:173)\nnomic transformation dX (z)  = dW (z) * wt (z ), the deviation of W (z)  is given by \n\n~ W(z) = ~X(z) * W(z). \n\n(19) \n\nHence, the natural gradient learning algorithm for W (z)  is described as \nWk+l(Z) = Wk(Z) -1]F(x(k), Wk(z)) * Wk(z) , \n\n(20) \nwhere F(x, W (z)) is an estimating function in the form (17). The stability ofthe algorithm \n(20) is equivalent to the one of algorithm (18). Consider the averaged version of algorithm \n(18) \n\n~X(z) =  -1]E[F(x(k), Wk(Z))] . \n\n(21) \n\nAnalyzing the  variational equation of the  above equation and  using the  mutual indepen(cid:173)\ndence and i.i.d.  properties of source signals, we derive the stability conditions of learning \nalgorithm (21) at vicinity of the true solution \n\nmi + 1 > 0,  K.i  > 0,  K.iK.ja;aJ  > 1, \n\n(22) \n\nfor i , j  = 1,\" ' , n, where mi = E(c.p'(Yi (k))y;(k)],  K.i  = E[c.p~(Yi)], a; = E[IYiI 2 ]. \nTherefore, we have the following theorem: \n\nTheorem 2  If the  conditions  (22) are satisfied,  then  the  natural gradient learning algo(cid:173)\nrithm (20) is  locally stable. \n\n\fBlind Deconvolution of Nonminimum Phase Systems \n\n369 \n\nReferences \n\n[1]  S.  Amari.  Differential-geometrical methods in statistics, Lecture Notes  in Statistics, \n\nvolume 28.  Springer, Berlin,  1985. \n\n[2]  S.  Amari.  Natural  gradient  works  efficiently  in  learning.  Neural  Computation, \n\n10:251-276, 1998. \n\n[3 J  S. Amari.  ICA of temporally correlated signals - Learning algorithm. In Proceeding \nof 1st Inter.  Workshop  on Independent Component Analysis and Signal Separation, \npages 37-42, Aussois, France, January,  11-15  1999. \n\n[4]  S. Amari. Superefficiency in blind source separation. IEEE Trans.  on Signal Process(cid:173)\n\ning, 47(4):936-944, April  1999. \n\n[5]  S.  Amari and J.-F.  Cardoso.  Blind source separation- semiparametric statistical ap(cid:173)\n\nproach.  IEEE Trans.  Signal Processing, 45:2692-2700, Nov.  1997. \n\n[6]  S.  Amari, T.  Chen, and  A.  Cichocki.  Nonholonomic orthogonal constraints in blind \n\nsource separation. Neural Comput., to  be published. \n\n[7]  S.  Amari  and  A.  Cichocki.  Adaptive  blind  signal  processing- neural  network ap(cid:173)\n\nproaches.  Proceedings of the IEEE, 86(10):2026-2048,1998. \n\n[81  S.  Amari, S.  Douglas, A.  Cichocki, and H.  Yang.  Multichannel blind deconvolution \n\nand equalization using the natural gradient.  In Proc.  IEEE Workshop on Signal Pro(cid:173)\ncessing Adv. in Wireless Communications, pages  101-104, Paris, France, April 1997. \n[9]  S. Amari and M.  Kawanabe.  Estimating functions in semiparametric statistical mod(cid:173)\nels.  In I. V.  Basawa, v.P. Godambe, and R.L. Taylor, editors, Estimating Functions, \nvolume 32 of Monograph Series, pages 65-81. IMS,  1998. \n\n[10]  S.  Amari  and  M.  Kumon.  Estimation  in  the  presence  of infinitely  many  nuisance \nparameters in semiparametric statistical models. Ann. Statistics,  16: 1044-1068, 1988. \n\n[11]  A.l. Bell and T.l. Sejnowski.  An information maximization approach to  blind sepa(cid:173)\n\nration and blind deconvolution. Neural Computation, 7: 1129-1159, 1995. \n\n[12]  P.  Bickel, C.  Klaassen,  Y.  Ritov,  and J.  Wellner.  Efficient and Adaptive Estimation \nfor Semiparametric Models.  The Johns Hopkins Univ.  Press, Baltimore and London, \n1993. \n\n[13]  J.-F Cardoso. Blind signal separation:  Statistical principles. Proceedings of the IEEE, \n\n86(10):2009-2025,1998. \n\n[14]  J.-F.  Cardoso and  B.  Laheld.  Equivariant adaptive source separation.  IEEE Trans. \n\nSignal Processing, SP-43: 30 17-3029, Dec 1996. \n\n[15]  A.  Cichocki and  R.  Unbehauen.  Robust  neural  networks  with on-line  learning for \nblind identification and blind separation of sources. IEEE Trans Circuits and Systems \nI: Fundamentals Theory and Applications, 43(11):894-906, 1996. \n\n[16]  L. Tong, R.W.  Liu, v.c. Soon, and Y.F.  Huang.  Indeterminacy and identifiability of \n\nblind identification.  IEEE Trans.  Circuits, Syst., 38(5):499-509, May  1991. \n\n[17]  H.  Yang  and  S.  Amari.  Adaptive  on-line  learning  algorithms  for  blind  separation: \nMaximum entropy and minimal mutual infonnation.  Neural Comput., 9: 1457-1482, \n1997. \n\n[18]  L.  Zhang, A.  Cichocki,  and S.  Amari.  Geometrical structures of  FIR manifold and \ntheir application  to  multichannel  blind deconvolution.  In  Proceeding  of NNSP'99, \npages 303-312, Madison, Wisconsin, August 23-25  1999. \n\n[19]  L.  Zhang,  A.  Cichocki,  and S.  Amari.  Multichannel  blind  deconvolution  of non(cid:173)\n\nminimum phase systems using information backpropagation.  In Proceedings of the \nFifth International Conference on Neural Information Processing(ICONIP'99), page \n210-216, Perth, Australia, Nov.  16-20 1999. \n\n\f", "award": [], "sourceid": 1653, "authors": [{"given_name": "Liqing", "family_name": "Zhang", "institution": null}, {"given_name": "Shun-ichi", "family_name": "Amari", "institution": null}, {"given_name": "Andrzej", "family_name": "Cichocki", "institution": null}]}