{"title": "Non-Uniform Camera Shake Removal Using a Spatially-Adaptive Sparse Penalty", "book": "Advances in Neural Information Processing Systems", "page_first": 1556, "page_last": 1564, "abstract": "Typical blur from camera shake often deviates from the standard uniform convolutional assumption, in part because of problematic rotations which create greater blurring away from some unknown center point.  Consequently, successful blind deconvolution for removing shake artifacts requires the estimation of a spatially-varying or non-uniform blur operator.  Using ideas from Bayesian inference and convex analysis, this paper derives a non-uniform blind deblurring algorithm with several desirable, yet previously-unexplored attributes.  The underlying objective function includes a spatially-adaptive penalty that couples the latent sharp image, non-uniform blur operator, and noise level together.  This coupling allows the penalty to automatically adjust its shape based on the estimated degree of local blur and image structure such that regions with large blur or few prominent edges are discounted.  Remaining regions with modest blur and revealing edges therefore dominate the overall estimation process without explicitly incorporating structure-selection heuristics.  The algorithm can be implemented using an optimization strategy  that is virtually parameter free and simpler than existing methods.  Detailed theoretical analysis and empirical validation on real images serve to validate the proposed method.", "full_text": "Non-Uniform Camera Shake Removal Using a\n\nSpatially-Adaptive Sparse Penalty\n\nHaichao Zhang\n\n\u2020\u2021\n\nand David Wipf\n\n\u00a7\n\n\u2020\n\n\u2021\n\n\u00a7\n\nSchool of Computer Science, Northwestern Polytechnical University, Xi\u2019an, China\n\nDepartment of Electrical and Computer Engineering, Duke University, USA\n\nVisual Computing Group, Microsoft Research Asia, Beijing, China\n\nhczhang1@gmail.com\n\ndavidwipf@gmail.com\n\nAbstract\n\nTypical blur from camera shake often deviates from the standard uniform convo-\nlutional assumption, in part because of problematic rotations which create greater\nblurring away from some unknown center point. Consequently, successful blind\ndeconvolution for removing shake artifacts requires the estimation of a spatially-\nvarying or non-uniform blur operator. Using ideas from Bayesian inference and\nconvex analysis, this paper derives a simple non-uniform blind deblurring algo-\nrithm with a spatially-adaptive image penalty. Through an implicit normalization\nprocess, this penalty automatically adjust its shape based on the estimated degree\nof local blur and image structure such that regions with large blur or few promi-\nnent edges are discounted. Remaining regions with modest blur and revealing\nedges therefore dominate on average without explicitly incorporating structure-\nselection heuristics. The algorithm can be implemented using an optimization\nstrategy that is virtually tuning-parameter free and simpler than existing methods,\nand likely can be applied in other settings such as dictionary learning. Detailed\ntheoretical analysis and empirical comparisons on real images serve as validation.\n\n1 Introduction\n\nImage blur is an undesirable degradation that often accompanies the image formation process and\nmay arise, for example, because of camera shake during acquisition. Blind image deblurring strate-\ngies aim to recover a sharp image from only a blurry, compromised observation. Extensive efforts\nhave been devoted to the uniform blur (shift-invariant) case, which can be described with the con-\nvolutional model y = k \u2217 x + n, where x is the unknown sharp image, y is the observed blurry\nimage, k is the unknown blur kernel (or point spread function), and n is a zero-mean Gaussian noise\nterm [6, 21, 17, 5, 28, 14, 1, 27, 29]. Unfortunately, many real-world photographs contain blur ef-\nfects that vary across the image plane, such as when unknown rotations are introduced by camera\nshake [17].\n\nMore recently, algorithms have been generalized to explicitly handle some degree of non-uniform\nblur using the more general observation model y = Hx+n, where each column of the blur operator\nH contains the spatially-varying effective blur kernel at the corresponding pixel site [25, 7, 8, 9,\n11, 4, 22, 12]. Note that the original uniform blur model can be achieved equivalently when H is\nforced to adopt certain structure (e.g., block-toeplitz structure with toeplitz-blocks). In general, non-\nuniform blur may arise under several different contexts. This paper will focus on the blind removal\nof non-uniform blur caused by general camera shake (as opposed to blur from object motion) using\nonly a single image, with no additional hardware assistance.\n\nWhile existing algorithms for addressing non-uniform camera shake have displayed a measure of\nsuccess, several important limitations remain. First, some methods require either additional spe-\n\n1\n\n\fcialized hardware such as high-speed video capture [23] or inertial measurement sensors [13] for\nestimating motion, or else multiple images of the same scene [4]. Secondly, even the algorithms that\noperate given only data from a single image typically rely on carefully engineered initializations,\nheuristics, and trade-off parameters for selecting salient image structure or edges, in part to avoid\nundesirable degenerate, no-blur solutions [7, 8, 9, 11]. Consequently, enhancements and rigorous\nanalysis may be problematic. To address these shortcomings, we present an alternative blind deblur-\nring algorithm built upon a simple, closed-form cost function that automatically discounts regions of\nthe image that contain little information about the blur operator without introducing any additional\nsalient structure selection steps. This transparency leads to a nearly tuning-parameter free algorithm\nbased upon a sparsity penalty whose shape adapts to the estimated degree of local blur, and provides\ntheoretical arguments regarding how to robustly handle non-uniform degradations.\n\nThe rest of the paper is structured as follows. Section 2 brie\ufb02y describes relevant existing work on\nnon-uniform blind deblurring operators and implementation techniques. Section 3 then introduces\nthe proposed non-uniform blind deblurring model, while further theoretical justi\ufb01cation and analyses\nare provided in Section 4. Experimental comparisons with state-of-the-art methods are carried out\nin Section 5 followed by conclusions in Section 6.\n\n2 Non-Uniform Deblurring Operators\n\nPerhaps the most direct way of handling non-uniform blur is to simply partition the image into differ-\nent regions and then learn a separate, uniform blur kernel for each region, possibly with an additional\nweighting function for smoothing the boundaries between two adjacent kernels. The resulting al-\ngorithm has been adopted extensively [18, 8, 22, 12] and admits an ef\ufb01cient implementation called\nef\ufb01cient \ufb01lter \ufb02ow (EFF) [10]. The downside with this type of model is that geometric relationships\nbetween the blur kernels of different regions derived from the the physical motion path of the camera\nare ignored.\n\nIn contrast, to explicitly account for camera motion, the projective motion path (PMP) model [23]\ntreats a blurry image as the weighted summation of projectively transformed sharp images, leading\nto the revised observation model\n\n(cid:2)\n\nj\n\ny =\n\nwjPjx + n,\n\n(1)\n\n(cid:3)\n\nwhere Pj is the j-th projection or homography operator (a combination of rotations and translations)\nand wj is the corresponding combination weight representing the proportion of time spent at that par-\nticular camera pose during exposure. The uniform convolutional model can be obtained by restrict-\ning the general projection operators {P j} to be translations. In this regard, (1) represents a more\ngeneral model that has been used in many recent non-uniform deblurring efforts [23, 25, 7, 11, 4].\nPMP also retains the bilinear property of uniform convolution, meaning that\n\ny = Hx + n = Dw + n,\n\n(2)\nj wjPj and D = [P1x, P2x,\u00b7\u00b7\u00b7 , Pjx,\u00b7\u00b7\u00b7 ] is a matrix of transformed sharp images.\nwhere H =\nThe disadvantage of PMP is that it typically leads to inef\ufb01cient algorithms because the evaluation\nof the matrix-vector product Hx = Dw requires generating many expensive intermediate trans-\nformed images. However, EFF can be combined with the PMP model by introducing a set of basis\nimages ef\ufb01ciently generated by transforming a grid of delta peak images [9]. The computational\ncost can be further reduced by using an active set for pruning out the projection operators with small\nresponses [11].\n\n3 A New Non-Uniform Blind Deblurring Model\n\nFollowing previous work [6, 16], we will work in the derivative domain of images for ease of mod-\neling and better performance, meaning that x \u2208 R\nn will denote the lexicographically\nordered sharp and blurry image derivatives respectively. 1\n\nm and y \u2208 R\n\n1The derivative \ufb01lters used in this work are {[\u22121, 1], [\u22121, 1]T}. Other choices are also possible.\n\n2\n\n\fThe observation model (1) is equivalent to the likelihood function\n(cid:5)y \u2212 Hx(cid:5)2\n\np(y|x, w) \u221d exp\n\n2\n\n(cid:4)\n\n\u2212 1\n2\u03bb\n\n(cid:5)\n\n,\n\n(3)\n\nwhere \u03bb denotes the noise variance. Maximum likelihood estimation of x and w using (3) is clearly\nill-posed and so further regularization is required to constrain the solution space. For this purpose\nwe adopt the Gaussian prior p(x) \u223c N (x; 0, \u0393), where \u0393 (cid:2) diag[\u03b3] with \u03b3 = [\u03b3 1, . . . , \u03b3m]T a\nvector of m hyperparameter variances, one for each element of x = [x 1, . . . , xm]T . While presently\n\u03b3 is unknown, if we \ufb01rst marginalize over the unknown x, we can estimate it jointly along with the\nblur parameters w and the unknown noise variance \u03bb. This type II maximum likelihood procedure\nhas been advocated in the context of sparse estimation, where the goal is to learn vectors with mostly\nzero-valued coef\ufb01cients [24, 26]. The \ufb01nal sharp image can then be recovered using the estimated\nkernel and noise level along with standard non-blind deblurring algorithms (e.g., [15]).\n\nMathematically, the proposed estimation scheme requires that we solve\n\nmax\n\np(y|x, w)p(x)dx \u2261 min\n\u03b3,w,\u03bb\u22650\n\n\u03b3,w,\u03bb\u22650\n\n(4)\nwhere a \u2212 log transformation has been included for convenience. Clearly (4) does not resemble the\ntraditional blind non-uniform deblurring script, where estimation proceeds using the more transpar-\nent penalized regression model [4, 7, 9]\n\nyT\n\nH\u0393HT + \u03bbI\n\n(cid:9)(cid:9)H\u0393HT + \u03bbI\n(cid:9)(cid:9) ,\n\n(cid:8)\u22121 y + log\n(cid:2)\n\n(cid:7)\n\n(cid:2)\n\n(cid:6)\n\n(cid:5)y \u2212 Hx(cid:5)2\n\n2 + \u03b1\n\nmin\nx;w\u22650\n\ng(xi) + \u03b2\n\nh(wj)\n\ni\n\nj\n\n(5)\n\nand \u03b1 and \u03b2 are user-de\ufb01ned trade-off parameters, g is an image penalty which typically favors\nsparsity, and h is usually assumed to be quadratic. Despite the differing appearances however,\n(4) has some advantageous properties with respect to deconvolution problems. In particular, it is\ndevoid of tuning parameters and it possesses more favorable minimization conditions. For example,\nconsider the simpli\ufb01ed non-uniform deblurring situation where the true x has a single non-zero\nelement and H is de\ufb01ned such that each column indexed by i is independently parameterized with\n\ufb01nite support symmetric around pixel i. Moreover, assume this support matches the true support of\nthe unknown blur operator. Then we have the following:\n\nLemma 1 Given the idealized non-uniform deblurring problem described above, the cost function\n(4) will be characterized by a unique minimizing solution that correctly locates the nonzero element\nin x and the corresponding true blur kernel at this location. No possible problem in the form of\n(5), with g(x) = |x|p, h(w) = wq, and {p, q} arbitrary non-negative scalars, can achieve a similar\nresult (there will always exist either multiple different minimizing solutions or an global minima that\ndoes not produce the correct solution).\n\nThis result, which can be generalized with additional effort, can be shown by expanding on some\nof the derivations in [26]. Although obviously the conditions upon which Lemma 1 is based are\nextremely idealized, it is nonetheless emblematic of the potential of the underlying cost function to\navoid local minima, etc., and [26] contains complementary results in the case where H is \ufb01xed.\nWhile optimizing (4) is possible using various general techniques such as the EM algorithm, it\nis computationally expensive in part because of the high-dimensional determinants involved with\nrealistic-sized images. Consequently we are presently considering various specially-tailored opti-\nmization schemes for future work. But for the present purposes, we instead minimize a convenient\nupper bound allowing us to circumvent such computational issues. Speci\ufb01cally, using Hadamard\u2019s\ninequality we have\n\n(cid:9)(cid:9)H\u0393HT + \u03bbI\n\nlog\n\n(cid:9)(cid:9)\u03bb\n(cid:9)(cid:9)\u03bb\n\n(cid:10)\n\n(cid:11)\n\u22121HT H + \u0393\u22121\n\u22121diag\nHT H\n+ (n \u2212 m) log \u03bb,\n\n(cid:9)(cid:9)\n+ \u0393\u22121\n\n(cid:9)(cid:9)\n\n(6)\n\nwhere \u00afwi denotes the i-th column of H. Note that Hadamard\u2019s inequality is applied by using\n\u22121HT H + \u0393\u22121 = VT V for some matrix V = [v1, . . . , vm]. We then have log |\u03bb\n\u22121HT H +\n\u03bb\n\u0393\u22121| = 2 log |V| \u2264 2 log (\n\n(cid:11)(cid:9)(cid:9), leading to the stated result.\n\n\u22121HT H + \u0393\u22121\n\u03bb\n\n(cid:2)\n\n(cid:9)(cid:9) = n log \u03bb + log |\u0393| + log\n(cid:8)\n(cid:7)\n\u2264 n log \u03bb + log |\u0393| + log\n\u03bb + \u03b3i(cid:5) \u00afwi(cid:5)2\n=\n(cid:9)(cid:9)diag\n(cid:10)\n\n(cid:12)\ni (cid:5)vi(cid:5)2) = log\n\nlog\n\n2\n\ni\n\n3\n\n\f2 = wT (BT\n\ni Bi = I ignoring edge effects, and therefore (cid:5) \u00afwi(cid:5)2 = (cid:5)w(cid:5)2 for all i.\n\ni wi = 1 for normalization purposes, it can easily be shown that 1/L \u2264 (cid:5) \u00afwi(cid:5)2\n\nAlso, the quantity (cid:5) \u00afwi(cid:5)2 which appears in (6) can be viewed as a measure of the degree of local\n(cid:3)\nblur at location i. Given the feasible region w \u2265 0 and without loss of generality the constraint\n2 \u2264 1, where\nL is the maximum number of elements in any local blur kernel \u00afwi or column of H. The upper\nbound is achieved when the local kernel is a delta solution, meaning only one nonzero element\nand therefore minimal blur. In contrast, the lower bound on (cid:5) \u00afwi(cid:5)2\n2 occurs when every element of\n\u00afwi has an equal value, constituting the maximal possible blur. This metric, which will in\ufb02uence\nour analysis in the next section, can be computing using (cid:5) \u00afwi(cid:5)2\ni Bi)w, where Bi (cid:2)\n[P1ei, P2ei,\u00b7\u00b7\u00b7 , Pjei,\u00b7\u00b7\u00b7 ] and ei denotes an all-zero image with a one at site i. In the uniform\ndeblurring case, BT\nWhile optimizing (4) using the upper bound from (6) can be justi\ufb01ed in part using Bayesian-inspired\narguments and the lack of trade-off parameters, the augmented cost function unfortunately no longer\nsatis\ufb01es Lemma 1. However, it is still well-equipped for estimating sparse image gradients and\navoiding degenerate no-blur solutions. For example, consider the case of an asymptotically large\nimage with iid distributed sparse image gradients, with some constant fraction exactly equal to zero\nand the remaining nonzero elements drawn from any continuous distribution. Now suppose that\nthis image is corrupted with a non-uniform blur operator of the form H =\nj wjPj, where the\ncardinality of the summation is \ufb01nite and H satis\ufb01es minimal regularity conditions. Then it can be\nshown that any global minimum of (4), with or without the bound from (6), will produce the true\nblur operator. Related intuition applies when noise is present or when the image gradients are not\nexactly sparse (we will defer more detailed analysis to a future publication).\nRegardless, the simpli\ufb01ed \u03b3-dependent cost function is still far less intuitive than the penalized\nregression models dependent on x such as (5) that are typically employed for non-uniform blind\ndeblurring. However, using the framework from [26], it can be shown that the kernel estimate\nobtained by this process is formally equivalent to the one obtained via\n\n(cid:3)\n\nmin\n\nx;w\u22650,\u03bb\u22650\n\n(cid:5)y \u2212 Hx(cid:5)2\n\n2 +\n\n1\n\u03bb\n\n\u03c8(u, \u03bb) (cid:2)\n\nu +\n\n\u221a\n2u\n4\u03bb + u2 + log\n\n(cid:2)\n\ni\n\n\u03c8(|xi|(cid:5) \u00afwi(cid:5)2, \u03bb) + (n \u2212 m) log \u03bb,\n(cid:15)\n\n(cid:13)\n\n(cid:14)\n\n2\u03bb + u2 + u\n\n4\u03bb + u2\n\n(7)\n\nwith\n\nu \u2265 0.\n\nThe optimization from (7) closely resembles a standard penalized regression (or equivalently MAP)\nproblem used for blind deblurring. The primary distinction is the penalty term \u03c8, which jointly reg-\nularizes x, w, and \u03bb as discussed Section 4. The supplementary \ufb01le derives a simple majorization-\nminimization algorithm for solving (7) along with additional implementational details. The under-\nlying procedure is related to variational Bayesian (VB) models from [1, 16, 20]; however, these\nmodels are based on a completely different mean-\ufb01eld approximation and a uniform blur assump-\ntion, and they do not learn the noise parameter. Additionally, the analysis provided with these VB\nmodels is limited by relatively less transparent underlying cost functions.\n\n4 Model Properties\nThe proposed blind deblurring strategy involves simply minimizing (7); no additional steps for trade-\noff parameter selection or structure/salient-edge detection are required unlike other state-of-the-art\napproaches. This section will examine theoretical properties of (7) that ultimately allow such a sim-\nple algorithm to succeed. First, we will demonstrate a form of intrinsic column normalization that\nfacilitates the balanced sparse estimation of the unknown latent image and implicitly de-emphasizes\nregions with large blur and few dominate edges. Later we describe an appealing form of noise-\ndependent shape adaptation that helps in avoiding local minima. While there are multiple, comple-\nmentary perspectives for interpreting the behavior of this algorithm, more detailed analyses, as well\nas extensions to other types of underdetermined inverse problems such as dictionary learning, will\nbe deferred to a later publication.\n\n4.1 Column-Normalized Sparse Estimation\nUsing the simple reparameterization zi (cid:2) xi(cid:5) \u00afwi(cid:5)2 it follows that (7) is exactly equivalent to solving\n(8)\n\n\u03c8(|zi|, \u03bb) + (n \u2212 m) log \u03bb,\n\n(cid:5)y \u2212 (cid:16)Hz(cid:5)2\n\n(cid:2)\n\nmin\n\n2 +\n\nz;w\u22650,\u03bb\u22650\n\n1\n\u03bb\n\ni\n\n4\n\n\fwhere z = [z1, . . . , zm]T and (cid:16)H is simply the (cid:7)2-column-normalized version of H. Moreover,\nit can be shown that this \u03c8 is a concave, non-decreasing function of |z|, and hence represents a\ncanonical sparsity-promoting penalty function with respect to z [26]. Consequently, noise and ker-\nnel dependencies notwithstanding, this reparameterization places the proposed cost function in a\nform exactly consistent with nearly all prototypical sparse regression problems, where (cid:7) 2 column\nnormalization is ubiquitous, at least in part, to avoid favoring one column over another during the\nestimation process (which can potentially bias the solution). To understand the latter point, note\n\n2 \u2261 zT (cid:16)HT (cid:16)Hz \u2212 2yT (cid:16)Hz. Among other things, because of the normalization, the\nthat (cid:5)y \u2212 (cid:16)Hz(cid:5)2\nquadratic factor (cid:16)HT (cid:16)H now has a unit diagonal, and likewise the inner products y T (cid:16)H are scaled\nare required since (cid:16)H is in some sense self-regularized by the normalization. Additional ancillary\n\nby the consistent induced (cid:7)2 norms, which collectively avoids the premature favoring of any one\nelement of z over another. Moreover, no additional heuristic kernel penalty terms such as in (5)\n\nbene\ufb01ts of (8) will be described in Section 4.2.\n\nOf course we can always apply the same reparameterization to existing algorithms in the form of\n(5). While this will indeed result in normalized columns and a properly balanced data-\ufb01t term, these\nraw norms will now appear in the penalty function g, giving the equivalent objective\n\n+ \u03b2\n\nh(wj).\n\n(9)\n\n(cid:5)y \u2212 (cid:16)Hz(cid:5)2\n\n2 + \u03b1\n\nmin\nz;w\u22650\n\n(cid:7)\nzi(cid:5) \u00afwi(cid:5)\u22121\n\n2\n\n(cid:8)\n\n(cid:2)\n\ng\n\ni\n\n(cid:2)\n\nj\n\nHowever, the presence of these norms now embedded in g may have undesirable consequences.\nSimply put, the problem (9) will favor solutions where the ratio z i/(cid:5) \u00afwi(cid:5)2 is sparse or nearly so,\nwhich can be achieved by either making many z i zero or many (cid:5) \u00afwi(cid:5)2 big. If some zi is estimated\nto be zero (and many zi will provably be exactly zero at any local minima if g(x) is a concave,\nnon-decreasing function of |x|), then the corresponding (cid:5) \u00afwi(cid:5)2 will be unconstrained. In contrast,\nif a given zi is non-zero, there will be a stronger push for the associated (cid:5) \u00afwi(cid:5)2 to be large, i.e.,\nmore like the delta kernel which maximizes the (cid:7) 2 norm. Thus, the relative penalization of the\nkernel norms will depend on the estimated local image gradients, and no-blur delta solutions may\nbe arbitrarily favored in parts of the image plane dominated by edges, the very place where blur\nestimation information is paramount.\nIn reality, the local kernel norms (cid:5) \u00afwi(cid:5)2, which quantify the degree of local blur as mentioned previ-\nously, should be completely independent of the sparsity of the image gradients in the same location.\nThis is of course because the different blurring effects from camera shake are independent of the\nlocations of strong edges in a given scene, since the blur operator is only a function of camera mo-\ntion (at least to \ufb01rst order approximation). One way to compensate for this independence would be\nto simply optimize (9) with (cid:5) \u00afwi(cid:5)2 removed from g. While this is possible in principle, enforcing\nthe non-convex, and coupled constraints required to maintain normalized columns is extremely dif-\n\ufb01cult. Another option would be to carefully choose \u03b2 and h to somehow compensate. In contrast,\nour algorithm handles these complications seamlessly without any additional penalty terms.\n\n4.2 Noise-Dependent, Parameter-Free Homotopy Continuation\n\nColumn normalization can be viewed as a principled \ufb01rst step towards solving challenging sparse\nestimation problems. However, when non-convex sparse regularizers are used for the image penalty,\ne.g., (cid:7)p norms with p < 1, then local minima can be a signi\ufb01cant problem. The rationalization for\n(cid:3)\nusing such potentially problematic non-convexity is as follows; more details can be found in [17, 27].\nWhen applied to a sharp image, any blur operator will necessarily contribute two opposing effects:\ni |yi|p, and\n(i) It reduces a measure of the image sparsity, which normally increases the penalty\ni |yi|p. Additionally,\n(ii) It broadly reduces the overall image variance, which actually reduces\nthe greater the degree of blur, the more effect (ii) will begin to overshadow (i). Note that we can\n(cid:3)\nalways apply greater and greater blur to any sharp image x such that the variance of the resulting\nblurry y is arbitrarily small. This then produces an arbitrarily small (cid:7) p norm, which implies that\ni |xi|p, meaning that the penalty actually favors the blurry image over the sharp one.\nIn a practical sense though, the amount of blur that can be tolerated before this undesirable prefer-\nence for y over x occurs is much larger as p approaches zero. This is because the more concave\nthe image penalty becomes (as a function of coef\ufb01cient magnitudes), the less sensitive it is to image\nvariance and the more sensitive it is to image sparsity. In fact the scale-invariant special case where\n\ni |yi|p <\n\n(cid:3)\n\n(cid:3)\n\n5\n\n\fp \u2192 0 depends only on sparsity, or the number of elements that are exactly equal to zero. 2 We may\ntherefore expect such a highly concave, sparsity promoting penalty to favor the sharp image over the\nblurry one in a broader range of blur conditions. Even with other families of penalty functions the\nsame basic notion holds: greater concavity means greater sparsity preference and less sensitivity to\nvariance changes that favor no-blur degenerate solutions.\n\nFrom an implementational standpoint, homotopy continuation methods provide one attractive means\nof dealing with dif\ufb01cult non-convex penalty functions and the associated constellation of local\nminima [3]. The basic idea is to use a parameterized family of sparsity-promoting functions\ng(x; \u03b8), where different values of \u03b8 determine the relative degree of concavity allowing a transi-\ntion from something convex such as the (cid:7) 1 norm (with \u03b8 large) to something concave such as the\n(cid:7)0 norm (with \u03b8 small). Moreover, to ensure cost function descent (see below), we also require that\ng(x; \u03b82) \u2265 g(x; \u03b81) whenever \u03b82 \u2265 \u03b81, noting that this rules out simply setting \u03b8 = p and using\nthe family of (cid:7)p norms. We then begin optimization with a large \u03b8 value; later as the estimation\nprogresses and hopefully we are near a reasonably good basin of attraction, \u03b8 is reduced introducing\ngreater concavity, a process which is repeated until convergence, all the while guaranteeing cost\nfunction descent. While potentially effective in practice, homotopy continuation methods require\nboth a trade-off parameter for g(x; \u03b8) and a pre-de\ufb01ned schedule or heuristic for adjusting \u03b8, both\nof which could potentially be image dependent.\n\nThe proposed deblurring algorithm automatically implements a form of noise-dependent, parameter-\nfree homotopy continuation with several attractive auxiliary properties [26]. To make this claim\nprecise and facilitate subsequent analysis, we \ufb01rst introduce the de\ufb01nition of relative concavity [19]:\n\nu(cid:2)(x) [u(y) \u2212 u(x)] holds \u2200x, y \u2208 [a, b].\n\nDe\ufb01nition 1 Let u be a strictly increasing function on [a, b]. The function \u03bd is concave relative to\nu on the interval [a, b] if and only if \u03bd(y) \u2264 \u03bd(x) + \u03bd(cid:2)(x)\nWe will use \u03bd \u227a u to denote that \u03bd is concave relative to u on [0,\u221e). This can be understood\nas a natural generalization of the traditional notion of a concavity, in that a concave function is\nequivalently concave relative to a linear function per De\ufb01nition 1. In general, if \u03bd \u227a u, then when \u03bd\nand u are set to have the same functional value and the same slope at any given point (i.e., by an af\ufb01ne\ntransformation of u), then \u03bd lies completely under u. In the context of homotopy continuation, an\nideal candidate penalty would be one for which g(x; \u03b8 1) \u227a g(x; \u03b82) whenever \u03b81 \u2264 \u03b82. This would\nensure that greater sparsity-inducing concavity is introduced as \u03b8 is reduced. We now demonstrate\nthat \u03c8(|z|, \u03bb) is such a function, with \u03bb occupying the role of \u03b8. This dependency on the noise\nparameter is unlike other continuation methods and ultimately leads to several attractive attributes.\nTheorem 1 If \u03bb1 < \u03bb2, then \u03c8(u, \u03bb1) \u227a \u03c8(u, \u03bb2) for u \u2265 0. Additionally, in the limit as \u03bb \u2192 0,\ni \u03c8(|zi|, \u03bb) converges to the (cid:7)0 norm (up to an inconsequential scaling and translation).\nthen\nConversely, as \u03bb becomes large,\n\n(cid:3)\ni \u03c8(|zi|, \u03bb) converges to 2(cid:5)z(cid:5)1/\n\n\u221a\n\u03bb.\n\n(cid:3)\n\n(cid:5)y \u2212 (cid:16)Hz(cid:5)2\n\n\u221a\n\u03bb(cid:5)z(cid:5)1\n\nThe proof has been deferred to the supplementary \ufb01le. The relevance of this result can be understood\nas follows. First, at the beginning of the optimization process \u03bb will be large both because of\ninitialization and because we have not yet found a relatively sparse z and associated w such that y\ncan be well-approximated; hence the estimated \u03bb should not be small. Based on Theorem 1, in this\nregime (8) approaches\n\nmin\n\n2 + 2\n\nz\n\n(10)\nassuming w and \u03bb are \ufb01xed. Note incidentally that this square-root dependency on \u03bb, which\narises naturally from our model, is frequently advocated when performing regular (cid:7) 1-norm penal-\nized sparse regression given that the true noise variance is \u03bb [2]. Additionally, because \u03bb must be\nrelatively large to arrive at this (cid:7)1 approximation, the estimation need only focus on reproducing\nthe largest elements in z since the sparse penalty will dominate the data \ufb01t term. Furthermore,\nthese larger elements are on average more likely to be in regions of relatively lower blurring or high\n(cid:5) \u00afwi(cid:5)2 value by virtue of the reparameterization z i = xi(cid:5) \u00afwi(cid:5)2. Consequently, the less concave\ninitial estimation can proceed successfully by de-emphasizing regions with high blur or low (cid:5) \u00afwi(cid:5)2,\nand focusing on coarsely approximating regions with relatively less blur.\n\n2Note that even if the true sharp image is not exactly sparse, as long as it can be reasonably well-\n\napproximated by some exactly sparse image in an (cid:2)2 norm sense, then the analysis here still holds [27].\n\n6\n\n\ft\nn\na\nh\np\ne\nl\nE\n\nBlurry\n\nSpatially Non-Adaptive\n\nSpatially Adaptive\n\nBlur-map\n\nFigure 1: Effectiveness of spatially-adaptive sparsity. From left to right: the blurry image, the\ndeblurred image and estimated local kernels without spatially-adaptive column normalization, the\nanalogous results with this normalization and its spatially-varying impact on image estimation, and\nthe associated map of (cid:5) \u00afwi(cid:5)\u22121\n\n2 , which re\ufb02ects the degree of estimated local blurring.\n\nLater as the estimation proceeds and w and z are re\ufb01ned, \u03bb will be reduced which in turn necessarily\nincreases the relative concavity of the penalty \u03c8 per Theorem 1. However, the added concavity will\nnow be welcome for resolving increasingly \ufb01ne details uncovered by a lower noise variance and the\nconcomitant boosted importance of the data \ufb01delity term, especially since many of these uncovered\ndetails may reside near increasingly blurry regions of the image and we need to avoid unwanted no-\nblur solutions. Eventually the penalty can even approach the (cid:7) 0 norm (although images are generally\nnot exactly sparse, and other noise factors and unmodeled artifacts are usually present such that \u03bb\nwill never go all the way to zero). Importantly, all of this implicit, spatially-adaptive penalization\noccurs without the need for trade-off parameters or additional structure selection measures, meaning\ncarefully engineered heuristics designed to locate prominent edges such that good global solutions\ncan be found without strongly concave image penalties [21, 5, 28, 8, 9]. Figure 1 displays results of\nthis procedure both with and without the spatially-varying column normalizations and the implicit\nadaptive penalization that help compensate for locally varying image blur.\n\n5 Experimental Results\n\nThis section compares the proposed method with several state-of-the-art algorithms for non-uniform\nblind deblurring using real-world images from previously published papers (note that source code\nis not available for conducting more widespread evaluations with most algorithms). The supple-\nmentary \ufb01le contains a number of additional comparisons, including assessments with a benchmark\nuniform blind deblurring dataset where ground truth is available. Overall, our algorithm consis-\ntently performs comparably or better on all of these respective images. Experimental speci\ufb01cs of\nour implementation (e.g., regarding the non-blind deblurring step, projection operators, etc.) are\nalso contained in the supplementary \ufb01le for space considerations.\nComparison with Harmeling et al. [8] and Hirsch et al. [9]: Results are based on three test\nimages provided in [8]. Figure 2 displays deblurring comparisons based on the Butchershop and\nVintage-car images. In both cases, the proposed algorithm reveals more \ufb01ne details than the\nother methods, despite its simplicity and lack of salient structure selection heuristics or trade-off\nparameters. Note that with these images, ground truth blur kernels were independently estimated\nusing a special capturing process [8]. As shown in the supplementary \ufb01le, the estimated blur kernel\npatterns obtained from our algorithm better resemble the ground truth relative to the other methods,\na performance result that compensates for any differences in the non-blind step.\nComparison with Whyte et al. [25]: Results on the Pantheon test image from [25] are shown in\nFigure 3 (top row), where we observe that the deblurred image from Whyte et al. has noticeable\nringing artifacts. In contrast, our result is considerably cleaner.\nComparison with Gupta et al. [7]: We next experiment using the test image Building from [7],\nwhich contains large rotational blurring that can be challenging for blind deblurring algorithms.\nFigure 3 (middle row) reveals that our algorithm contains less ringing and more \ufb01ne details relative\nto Gupta et al.\nComparison with Joshi et al. [13]: Joshi et al. presents a deblurring algorithm that relies upon\nadditional hardware for estimating camera motion [13]. However, even without this additional in-\n\n7\n\n\fp\no\nh\ns\nr\ne\nh\nc\nt\nu\nB\n\nr\na\nc\n-\ne\ng\na\nt\nn\ni\nV\n\nBLURRY\n\nHARMELING\n\nHIRSCH\n\nOUR\n\nBLURRY\n\nHARMELING\n\nHIRSCH\n\nOUR\n\nFigure 2: Non-uniform deblurring results. Comparison with Harmeling [8] and Hirsch [9] on\nreal-world images. (better viewed electronically with zooming)\n\nn\no\ne\nh\nt\nn\na\nP\n\ng\nn\ni\nd\nl\ni\nu\nB\n\nBLURRY\n\nWHYTE\n\nOUR\n\nBLURRY\n\nGUPTA\n\nOUR\n\ne\nr\nu\nt\np\nl\nu\nc\nS\n\nOUR\n\nJOSHI\n\nBLURRY\nFigure 3: Non-uniform deblurring results. Comparison with Whyte [25], Gupta [7], and Joshi [13]\non real-world images. (better viewed electronically with zooming)\nformation, our algorithm produces a better sharp estimate of the Sculpture image from [13], with\nfewer ringing artifacts and higher resolution details. See Figure 3 (bottom row).\n\n6 Conclusion\nThis paper presents a strikingly simple yet effective method for non-uniform camera shake re-\nmoval based upon a principled, transparent cost function that is open to analysis and further ex-\ntensions/re\ufb01nements. For example, it can be combined with the model from [29] to perform joint\nmulti-image alignment, denoising, and deblurring. Both theoretical and empirical evidence are\nprovided demonstrating the ef\ufb01cacy of the blur-dependent, spatially-adaptive sparse regularization\nwhich emerges from our model. The framework also suggests exploring other related cost functions\nthat, while deviating from the original probabilistic script, nonetheless share similar properties. One\n\u03bb + |xi|(cid:5) \u00afwi(cid:5)2); many others are possible.\nsuch simple example is a penalty of the form\n\n\u221a\ni log(\n\n(cid:3)\n\nAcknowledgements\nThis work was supported in part by National Natural Science Foundation of China (61231016).\n\n8\n\n\fReferences\n\n[1] S. D. Babacan, R. Molina, M. N. Do, and A. K. Katsaggelos. Bayesian blind deconvolution\n\nwith general sparse image priors. In ECCV, 2012.\n\n[2] E. Cand\u00e8s and Y. Plan. Near-ideal model selection by (cid:7) 1 minimization. The Annals of Statistics,\n\n(5A):2145\u20132177.\n\n[3] R. Chartrand and W. Yin.\n\nICASSP, 2008.\n\nIteratively reweighted algorithms for compressive sensing.\n\nIn\n\n[4] S. Cho, H. Cho, Y.-W. Tai, and S. Lee. Registration based non-uniform motion deblurring.\n\nComput. Graph. Forum, 31(7-2):2183\u20132192, 2012.\n\n[5] S. Cho and S. Lee. Fast motion deblurring. In SIGGRAPH ASIA, 2009.\n[6] R. Fergus, B. Singh, A. Hertzmann, S. T. Roweis, and W. T. Freeman. Removing camera shake\n\nfrom a single photograph. In SIGGRAPH, 2006.\n\n[7] A. Gupta, N. Joshi, C. L. Zitnick, M. Cohen, and B. Curless. Single image deblurring using\n\nmotion density functions. In ECCV, 2010.\n\n[8] S. Harmeling, M. Hirsch, and B. Sch\u00f6lkopf. Space-variant single-image blind deconvolution\n\nfor removing camera shake. In NIPS, 2010.\n\n[9] M. Hirsch, C. J. Schuler, S. Harmeling, and B. Sch\u00f6lkopf. Fast removal of non-uniform camera\n\nshake. In ICCV, 2011.\n\n[10] M. Hirsch, S. Sra, B. Scholkopf, and S. Harmeling. Ef\ufb01cient \ufb01lter \ufb02ow for space-variant\n\nmultiframe blind deconvolution. In CVPR, 2010.\n\n[11] Z. Hu and M.-H. Yang. Fast non-uniform deblurring using constrained camera pose subspace.\n\nIn BMVC, 2012.\n\n[12] H. Ji and K. Wang. A two-stage approach to blind spatially-varying motion deblurring. In\n\nCVPR, 2012.\n\n[13] N. Joshi, S. B. Kang, C. L. Zitnick, and R. Szeliski. Image deblurring using inertial measure-\n\nment sensors. In ACM SIGGRAPH, 2010.\n\n[14] D. Krishnan, T. Tay, and R. Fergus. Blind deconvolution using a normalized sparsity measure.\n\nIn CVPR, 2011.\n\n[15] A. Levin, R. Fergus, F. Durand, and W. T. Freeman. Deconvolution using natural image priors.\n\nTechnical report, MIT, 2007.\n\n[16] A. Levin, Y. Weiss, F. Durand, and W. T. Freeman. Ef\ufb01cient marginal likelihood optimization\n\nin blind deconvolution. In CVPR, 2011.\n\n[17] A. Levin, Y. Weiss, F. Durand, and W. T. Freeman. Understanding blind deconvolution algo-\n\nrithms. IEEE Trans. Pattern Anal. Mach. Intell., 33(12):2354\u20132367, 2011.\n\n[18] J. G. Nagy and D. P. O\u2019Leary. Restoring images degraded by spatially variant blur. SIAM J.\n\nSci. Comput., 19(4):1063\u20131082, 1998.\n\n[19] J. A. Palmer. Relatve convexity. Technical report, UCSD, 2003.\n[20] J. A. Palmer, D. P. Wipf, K. Kreutz-Delgado, and B. D. Rao. Variational EM algorithms for\n\nnon-Gaussian latent variable models. In NIPS, 2006.\n\n[21] Q. Shan, J. Jia, and A. Agarwala. High-quality motion deblurring from a single image. In\n\nSIGGRAPH, 2008.\n\n[22] M. Sorel and F. Sroubek. Image Restoration: Fundamentals and Advances. CRC Press, 2012.\n[23] Y.-W. Tai, P. Tan, and M. S. Brown. Richardson-Lucy deblurring for scenes under a projective\n\nmotion path. IEEE Trans. Pattern Anal. Mach. Intell., 33(8):1603\u20131618, 2011.\n\n[24] M. E. Tipping. Sparse bayesian learning and the relevance vector machine. Journal of Machine\n\nLearning Research, 1:211\u2013244, 2001.\n\n[25] O. Whyte, J. Sivic, A. Zisserman, and J. Ponce. Non-uniform deblurring for shaken images.\n\nIn CVPR, 2010.\n\n[26] D. P. Wipf, B. D. Rao, and S. S. Nagarajan. Latent variable Bayesian models for promoting\n\nsparsity. IEEE Trans. Information Theory, 57(9):6236\u20136255, 2011.\n\n[27] D. P. Wipf and H. Zhang. Revisiting Bayesian blind deconvolution. submitted to Journal of\n\nMachine Learning Research, 2013.\n\n[28] L. Xu and J. Jia. Two-phase kernel estimation for robust motion deblurring. In ECCV, 2010.\n[29] H. Zhang, D. P. Wipf, and Y. Zhang. Multi-image blind deblurring using a coupled adaptive\n\nsparse prior. In CVPR, 2013.\n\n9\n\n\f", "award": [], "sourceid": 779, "authors": [{"given_name": "Haichao", "family_name": "Zhang", "institution": "Duke University"}, {"given_name": "David", "family_name": "Wipf", "institution": "Microsoft Research"}]}