{"title": "k-Prototype Learning for 3D Rigid Structures", "book": "Advances in Neural Information Processing Systems", "page_first": 2589, "page_last": 2597, "abstract": "In this paper, we study the following new variant of prototype learning, called {\\em $k$-prototype learning problem for 3D rigid structures}: Given a set of 3D rigid structures, find a set of $k$ rigid structures so that each of them is a prototype for a cluster of the given rigid structures and the total cost (or dissimilarity) is minimized. Prototype learning is a core problem in machine learning and has a wide range of applications in many areas. Existing results on this problem have mainly focused on the graph domain. In this paper, we present the first algorithm for learning multiple prototypes from 3D rigid structures. Our result is based on a number of new insights to rigid structures alignment, clustering, and prototype reconstruction, and is practically efficient with quality guarantee. We validate our approach using two type of data sets, random data and biological data of chromosome territories. Experiments suggest that our approach can effectively learn prototypes in both types of data.", "full_text": "k-Prototype Learning for 3D Rigid Structures (cid:63)\n\nHu Ding\n\nRonald Berezney\n\nDepartment of Computer Science and Engineering\n\nState University of New York at Buffalo\n\nDepartment of Biological Sciences\n\nState University of New York at Buffalo\n\nBuffalo, NY14260\n\nhuding@buffalo.edu\n\nBuffalo, NY14260\n\nberezney@buffalo.edu\n\nJinhui Xu\n\nDepartment of Computer Science and Engineering\n\nState University of New York at Buffalo\n\nBuffalo, NY14260\n\njinhui@buffalo.edu\n\nAbstract\n\nIn this paper, we study the following new variant of prototype learning, called\nk-prototype learning problem for 3D rigid structures: Given a set of 3D rigid\nstructures, \ufb01nd a set of k rigid structures so that each of them is a prototype for\na cluster of the given rigid structures and the total cost (or dissimilarity) is mini-\nmized. Prototype learning is a core problem in machine learning and has a wide\nrange of applications in many areas. Existing results on this problem have mainly\nfocused on the graph domain. In this paper, we present the \ufb01rst algorithm for learn-\ning multiple prototypes from 3D rigid structures. Our result is based on a number\nof new insights to rigid structures alignment, clustering, and prototype reconstruc-\ntion, and is practically ef\ufb01cient with quality guarantee. We validate our approach\nusing two type of data sets, random data and biological data of chromosome terri-\ntories. Experiments suggest that our approach can effectively learn prototypes in\nboth types of data.\n\n1\n\nIntroduction\n\nLearning prototype from a set of given or observed objects is a core problem in machine learning,\nand has numerous applications in computer vision, pattern recognition, data mining, bioinformatics,\netc. A commonly used approach for this problem is to formulate it as an optimization problem and\ndetermine an object (called pattern or prototype) so as to maximize the total similarity (or minimize\nthe total difference) with the input objects. Such computed prototypes are often used to classify or\nindex large-size structural data so that queries can be ef\ufb01ciently answered by only considering those\nprototypes. Other important applications of prototype include reconstructing object from partially\nobserved snapshots and identifying common (or hidden) pattern from a set of data items.\nIn this paper, we study a new prototype learning problem called k-prototype learning for 3D rigid\nstructures, where a 3D rigid structure is a set of points in R3 whose pairwise distances remain\ninvariant under rigid transformation. Since our problem needs to determine k prototypes, it thus can\nbe viewed as two tightly coupled problems, clustering rigid structures and prototype reconstruction\nfor each cluster.\nOur problem is motivated by an important application in biology for determining the spatial organi-\nzation pattern of chromosome territories from a population of cells. Recent research in biology [3]\n\n(cid:63)This research was supported in part by NSF under grant IIS-1115220.\n\n\fhas suggested that con\ufb01guration of chromosome territories could signi\ufb01cantly in\ufb02uence the cell\nmolecular processes, and are closely related to cancer-promoting chromosome translocations. Thus,\n\ufb01nding the spatial organization pattern of chromosome territories is a key step to understanding the\ncell molecular processes [6,7,10,25]. Since the set of observed chromosome territories in each cell\ncan be represented as a 3D rigid structure, the problem can thus be formulated as a k-prototype\nlearning problem for a set of 3D rigid structures.\nRelated work: Prototype learning has a long and rich history. Most of the research has focused on\n\ufb01nding prototype in the graph domain. Jiang et al. [18] introduced the median graph concept, which\ncan be viewed as the prototype of a set of input graphs, and presented a genetic approach to solve\nit. Later, Ferrer et al. [14] proposed another ef\ufb01cient method for median graph. Their idea is to \ufb01rst\nembed the graphs into some metric space, and obtain the median using a recursive procedure. In the\ngeometric domain, quite a number of results have concentrated on \ufb01nding prototypes from a set of\n2D shapes [11,20,21,22]. A commonly used strategy in these methods is to \ufb01rst represent each shape\nas a graph abstraction and then compute the median of the graph abstractions.\nOur prototype learning problem is clearly related to the challenging 3D rigid structure clustering\nand alignment problem [1,2,4,5,13,17]. Due to its complex nature, most of the existing approaches\nare heuristic algorithms and thus cannot guarantee the quality of solution. There are also some\ntheoretical results [13] on this problem, but none of them is practical due to their high complexities.\nOur contributions and main ideas: 1 Our main objective on this problem is to obtain a practical\nsolution which has guarantee on the quality of its solution. For this purpose, we \ufb01rst give a formal\nde\ufb01nition of the problem and then consider two cases of the problem, 1-prototype learning and\nk-prototype learning.\nFor 1-prototype learning, we \ufb01rst present a practical algorithm for the alignment problem. Our result\nis based on a multi-level net technique which \ufb01nds the proper Euler angles for the rigid transforma-\ntion. With this alignment algorithm, we can then reduce the prototype learning problem to a new\nproblem called chromatic clustering (see Figure 1(b) and 1(c )), and present two approximate solu-\ntions for it. Finally, a local improvement algorithm is introduced to iteratively improve the quality\nof the obtained prototype.\nFor k-prototype learning, a key challenge is how to avoid the high complexity associated with clus-\ntering 3D rigid structures. Our idea is to map each rigid structure to a point in some metric space\nand build a correlation graph to capture their pairwise similarity. We show that the correlation graph\nis metric; this means that we can reduce the rigid structure clustering problem to a metric k-median\nclustering problem on the correlation graph. Once obtaining the clustering, we can then use the\n1-prototype learning algorithm on each cluster to generate the desired prototype. We also provide\ntechniques to deal with several practical issues, such as the unequal sizes of rigid structures and the\nweaker metric property caused by imperfect alignment computation for the correlation graph.\nWe validate our algorithms by using two types of datasets. The \ufb01rst is randomly generated datasets\nand the second is a real biological dataset of chromosome territories. Experiments suggest that our\napproach can effectively reduce the cost in prototype learning.\n2 Preliminaries\nIn this section, we introduce several de\ufb01nitions which will be used throughout this paper.\nDe\ufb01nition 1 (m-Rigid Structure). Let P = {p1,\u00b7\u00b7\u00b7 , pm} be a set of m points in 3D space. P is\nan m-rigid structure if the distance between any pair of vertices pi and pj in P remains the same\nunder any rigid transformation, including translation, rotation, re\ufb02ection and their combinations,\non P . For any rigid transformation T , the image of P under T is denoted as T (P ).\nDe\ufb01nition 2 (Bipartite Matching). Let S1 and S2 be two point-sets in 3D space with |S1| = |S2|,\nand G = (U, V, E) be their induced complete bipartite graph, where each vertex in U (or V )\ncorresponds to a unique point in S1 (or S2), and each edge in E is associated with a weight equal to\nthe Euclidean distance of the corresponding two points. The bipartite matching of S1 and S2, is the\none-to-one match from S1 to S2 with the minimum total matching weight (denoted as Cost(S1, S2))\nin G (see Figure 1(a)).\n1 Due to space limit, we put some details and proofs in our full version paper.\n\n\fNote that the bipartite matching can be computed using some existing algorithms, such as the Hun-\ngarian algorithm [24].\n\n(a)\n\n(b)\n\n(c)\n\nFig. 1: (a) An example of bipartite matching (red edges); (b) 4 point-sets with each in a different\ncolor; (c ) chromatic clustering of point-sets in (b). The three clusters form a chromatic partition.\n\nDe\ufb01nition 3 (Alignment). Let P and Q be two m-rigid structures in 3D space with points\n{p1,\u00b7\u00b7\u00b7 , pm} and {q1,\u00b7\u00b7\u00b7 , qm} respectively. Their alignment is to \ufb01nd a rigid transformation T\nfor P so as to minimize the cost of the bipartite matching between T (P ) and Q. The minimum\n(alignment) cost, minT Cost(T (P ), Q), is denoted by A(P, Q).\nDe\ufb01nition 4 (k-Prototype Learning). Let P1,\u00b7\u00b7\u00b7 Pn be n different m-rigid structures in 3D, and\nk be a positive integer. k-prototype learning is to determine k m-rigid structures, Q1,\u00b7\u00b7\u00b7 , Qk, so\nas to minimize the following objective function\n\nn(cid:88)\n\ni=1\n\nA(Pi, Qj).\n\nmin\n1\u2264j\u2264k\n\n(1)\n\n1-Prototype learning\n\nFrom De\ufb01nition 4, we know that the k-prototype learning problem can be viewed as \ufb01rst clustering\nthe rigid structures into k clusters and then build a prototype for each cluster so as to minimize the\ntotal alignment cost.\n3\nIn this section, we consider the 1-prototype learning problem. We \ufb01rst overview the main steps of\nour algorithm and then present the details in each subsection. Our algorithm is an iterative procedure.\nIn each iteration, it constructs a new prototype using the one from previous iteration, and reduces\nthe objective value. A \ufb01nal prototype is obtained once the objective value becomes stable.\nAlgorithm: 1-prototype learning\n\n1. Randomly select a rigid structure from the input {P1,\u00b7\u00b7\u00b7 , Pn} as the initial prototype Q.\n2. Repeatedly perform the following steps until the objective value becomes stable.\n\n(a) For each Pi, \ufb01nd the rigid transformation (approximately) realizing A(Pi, Q).\n(b) Based on the new con\ufb01guration (i.e., after the corresponding rigid transformation) of\n\neach Pi, construct an updated prototype Q which minimizes the objective value.\n\nSince both of 2(a) and 2(b) reduce the cost, the objective value would always decrease. In the next\ntwo subsections, we discuss our ideas for Step 2(a) (alignment) and Step 2(b) (prototype reconstruc-\ntion), respectively. Note that the above algorithm is different with generalized procrustes analysis\n(GPA) [15], since the points from each Pi are not required to be pre-labeled in our algorithm, while\nfor GPA every input point should have an individual index. This is also the main dif\ufb01culty for this\nprototype learning problem.\n\n3.1 Alignment\n\nTo determine the alignment of two rigid structures, one way is to use our recent theoretical algorithm\nfor point-set matching [13]. For any pair of point-sets P and Q in Rd space with m points each,\n\u0001d2 m2d+2 log2d(m)) time, a rigid transformation T for P so that the\nour algorithm outputs, in O( 1\nbipartite matching cost between T (P ) and Q is a (1 + \u0001)-approximation of the optimal alignment\ncost between P and Q, where \u0001 > 0 is a small constant. Applying this algorithm to our 3D rigid\n\u00019 m8 log6(m)). The algorithm is based on following key\nstructures, the running time becomes O( 1\n\nUVq1q2q3\fidea. First, we show that there exist 3 \u201ccritical\u201d points, called base, in each of P and Q, which\ncontrol the matching cost. Although the base cannot be explicitly identi\ufb01ed, it is possible to obtain\nit implicitly by considering all 3-tuples of the points in P and Q. The algorithm then builds an \u0001-net\naround each base point to determine an approximate rigid transformation. Clearly, this theoretical\nalgorithm is ef\ufb01cient only when m is small. For large m, we use the following relaxation.\nFirst, we change the edge weight in De\ufb01nition 2 from Euclidean distance to squared Euclidean\ndistance. The following lemma shows some nice property of such a change.\nLemma 1. Let P = {p1,\u00b7\u00b7\u00b7 , pm} and Q = {q1,\u00b7\u00b7\u00b7 , qm} be two m-rigid structures in 3D space,\nand T be the rigid transformation realizing the minimum bipartite matching cost (where the edge\nweight is replaced by the squared Euclidean distance of the corresponding points in De\ufb01nition 2).\nThen, the mean points of T (P ) and Q coincide with each other.\n\nLemma 1 tells us that to align two rigid structures, we can \ufb01rst translate them to share one common\nmean point, and then consider only the rotation in 3D space. (Note that we can ignore re\ufb02ection in\nthe rigid transformation, as it can be captured by computing the alignment twice, one for the original\nrigid structure, and the other for its mirror image.) Using Euler angles and 3D rotation matrix, we\ncan easily have the following fact.\nFact 1 Give any rotation matrix A in 3D, there are 3 angles \u03c6, \u03b8, \u03c8 \u2208 (\u2212\u03c0, \u03c0], and three matrices,\nA1, A2 and A3 such that A = A1 \u2217 A2 \u2217 A3, where\n\n\uf8f9\uf8fb , A2 =\n\n\uf8ee\uf8f0 cos \u03b8 0 sin \u03b8\n\n0\n\u2212 sin \u03b8 0 cos \u03b8\n\n0\n\n1\n\n\uf8f9\uf8fb and A3 =\n\n\uf8ee\uf8f0cos \u03c8 \u2212 sin \u03c8 0\n\nsin \u03c8 cos \u03c8 0\n1\n\n0\n\n0\n\n\uf8f9\uf8fb .\n\n\uf8ee\uf8f01\n\nA1 =\n\n0\n\n0\n\n0 cos \u03c6 \u2212 sin \u03c6\n0 sin \u03c6 cos \u03c6\n\nFrom the above Fact 1, we know that the main issue for aligning two rigid structures P and Q is\nto \ufb01nd three proper angles \u03c6, \u03b8, \u03c8 to minimize the cost. Clearly, this is a non-convex optimization\nproblem. Thus, we cannot use existing convex optimization methods to obtain an ef\ufb01cient solution.\nOne way to solve this problem is to build a dense enough \u0001-net (or grid) in the domain [\u2212\u03c0, \u03c0]3 of\n\u03c6, \u03b8, \u03c8, and evaluate each grid point to \ufb01nd the best possible solution. Clearly, this will be rather\ninef\ufb01cient when the number of grid points is huge. To obtain a practically ef\ufb01cient solution, our\nstrategy is to generalize the idea of building a dense net to recursively building a sparse net, which is\ncalled multi-level net. At each level, we partition the current searching domain into a set of smaller\nregions, which can be viewed as a sparse net, and evaluate some representative point in each of the\nsmaller region to determine its likelihood of containing the optimal point. The recursion will only\ncontinue at the most likely N smaller regions (for some well selected parameter N \u2265 1 in practice).\nIn this way, we can save a great deal of time for searching the optimal point in those unlikely regions.\nBelow is the main steps of our approach.\n\n1. Let S be the current searching space, which is initialized as [\u2212\u03c0, \u03c0]3, and t, N be two input\nparameters. Recursively perform the following steps until the best objective value in two\nconsecutive recursive steps roughly remains the same.\n(a) Uniformly partition S into t disjoint sub-regions S = S1 \u222a \u00b7\u00b7\u00b7 \u222a St.\n(b) Randomly select a representative point si \u2208 Si, and compute the alignment cost under\n(c) Choose the top N points with the minimum objective values from {s1,\u00b7\u00b7\u00b7 , st}. Let\n\nthe rotational matrix corresponding to si via Hungarian algorithm.\n{st1,\u00b7\u00b7\u00b7 , stN} be the chosen points.\n\n(d) Update S =(cid:83)N\n\ni=1 Sti.\n\n2. Output the rotation which yields the minimum objective value.\n\nWhy not use other alignment algorithms? There are several existing alignment algorithms for 3D\nrigid structures, and each suffers from its own limitations. For example, the Iterative Closest Point\nalgorithm [4] is one of the most popular algorithms for alignment. However, it does not generate the\none-to-one match between the rigid structures. Instead, every point in one rigid structure is matched\nto its nearest neighbor in the other rigid structure. This means that some point could match multiple\npoints in the other rigid structure. Obviously, this type of matching cannot meet our requirement,\nespecially in the biological application where chromosome territory is expected to match only one\n\n\fchromosome. Similar problem also occurs in some other alignment algorithms [1,5,17]. Arun et\nal. [2] presented an algebraic approach to \ufb01nd the best alignment between two 3D point-sets. Al-\nthough their solution is a one-to-one match, it requires that the correspondence between the two\npoint-sets is known in advance, which is certainly not the case in our model. Branch-and-bound\n(BB) approach [16] needs to grow a searching tree in the parameter space, and for each node it re-\nquires estimating the upper and lower bounds of the objective value in the corresponding sub-region.\nBut for our alignment problem, it is challenging to obtain such accurate estimations.\n3.2 Prototype reconstruction\nIn this section, we discuss how to build a prototype from a set of 3D rigid structures. We \ufb01rst \ufb01x\nthe position of each Pi, and then construct a new prototype Q to minimize the objective function in\nDe\ufb01nition 4. Our main idea is to introduce a new type of clustering problem called Chromatic Clus-\ntering which was \ufb01rstly introduced by Ding and Xu [12], and reduce our prototype reconstruction\nproblem to it. We start with two de\ufb01nitions.\nDe\ufb01nition 5 (Chromatic Partition). Let G = {G1,\u00b7\u00b7\u00b7 , Gn} be a set of n point-sets with each Gi\nconsisting of m points in the space. A chromatic partition of G is a partition of the n\u00d7 m points into\nm sets, U1,\u00b7\u00b7\u00b7 , Um, such that each Uj contains exactly one point from each Gi.\nDe\ufb01nition 6 (Chromatic Clustering). Let G = {G1, \u00b7\u00b7\u00b7 , Gn} be a set of n point-sets with each\nGi consisting of m points in the space. The chromatic clustering of G is to \ufb01nd m median points\n||p\u2212\nqj|| is minimized, where || \u00b7 || denotes the Euclidean distance.\n\n{q1,\u00b7\u00b7\u00b7 , qm} in the space and a chromatic partition U1,\u00b7\u00b7\u00b7 , Um of G such that(cid:80)m\n\n(cid:80)\n\np\u2208Uj\n\nj=1\n\nFrom De\ufb01nition 6, we know that chromatic clustering is quite similar to k-median clustering in\nEuclidean space; the only difference is that it has the chromatic requirement, i.e., the obtained k\nclusters should be a chromatic partition (see Figure 1(b) and 1(c )).\nReduction to chromatic clustering. Since the position of each Pi is \ufb01xed (note that with a slight\nabuse of notation, we still use Pi to denote its image T (Pi) under the rigid transformation T obtained\nin Section 3.1), we can view each Pi as a point-set Gi, and the new prototype Q as the k median\npoints {q1,\u00b7\u00b7\u00b7 , qm} in De\ufb01nition 6. Further, if a point p \u2208 Pi is matched to qj, then it is part of Uj.\nSince we compute the one-to-one match, Uj contains exactly one point from each Pi, which implies\nthat {U1,\u00b7\u00b7\u00b7 , Um} is a chromatic partition on G. Let pi\nj be the one in Pi \u2229 Uj. Then the objective\nfunction in De\ufb01nition 4 becomes\n\nn(cid:88)\n\nm(cid:88)\n\nm(cid:88)\n\nn(cid:88)\n\n||pi\n\nj \u2212 qj|| =\n\n||pi\n\nj \u2212 qj|| =\n\n||p \u2212 qj||,\n\n(2)\n\ni=1\n\nj=1\n\nj=1\n\ni=1\n\nm(cid:88)\n\n(cid:88)\n\nj=1\n\np\u2208Uj\n\nwhich is exactly the objective function in De\ufb01nition 6. Thus, we have the following theorem.\n\nTheorem 1. Step 2(b) in the algorithm of 1-prototype learning is equivalent to solving a chromatic\nclustering problem.\nNext, we give two constant approximation algorithms for the chromatic clustering problem; one is\nrandomized, and the other is deterministic.\nTheorem 2. Let G = {G1,\u00b7\u00b7\u00b7 , Gn} be an instance of chromatic clustering with each Gi consisting\nof m points in the space.\n1. If Gl is randomly selected from G as the m median points, then with probability at least\n\n1/2, Gl yields a 3-approximation for chromatic clustering on G.\n\n2. If enumerating all point-sets in G as the m median points, there exists one Gi0, which yields\n\na 2-approximation for chromatic clustering on G.\n\nProof. We consider the randomized algorithm \ufb01rst. Let {q1,\u00b7\u00b7\u00b7 , qm} be the m median points in\nthe optimal solution, and U1,\u00b7\u00b7\u00b7 , Um be the corresponding chromatic partition. Let pi\nj = Gi \u2229 Uj.\nSince the objective value is the sum of the total cost from all point-sets {G1,\u00b7\u00b7\u00b7 , Gn}, by Markov\ninequality, the contribution from Gl should be no more than 2 times the average cost with probability\nat least 1/2, i.e.,\n\nm(cid:88)\n\nj=1\n\nn(cid:88)\n\nm(cid:88)\n\ni=1\n\nj=1\n\n||pl\n\nj \u2212 qj|| \u2264 2\n\n1\nn\n\n||pi\n\nj \u2212 qj||.\n\n(3)\n\n\fFrom (3) and triangle inequality, if replacing each qj by pl\nj \u2212 qj|| + ||qj \u2212 pl\n\nj \u2212 pl\n\n(||pi\n\n||pi\n\nj, the objective value becomes\nj||)\n\nn(cid:88)\n\nm(cid:88)\n\nj|| \u2264 n(cid:88)\nn(cid:88)\n\ni=1\n\nm(cid:88)\nm(cid:88)\n\nj=1\n\n=\n\nj \u2212 qj|| + n \u00d7 m(cid:88)\n\n||pi\n\nn(cid:88)\n\nm(cid:88)\n\n||qj \u2212 pl\n\nj|| \u2264 3\n\n||pi\n\nj \u2212 qj||,\n\n(4)\n\n(5)\n\ni=1\n\nj=1\n\ni=1\n\nj=1\n\nj=1\n\ni=1\n\nj=1\n\nwhere (4) follows from triangle inequality, and (5) follows from (3). Thus, the \ufb01rst part of the\ntheorem is true. The analysis for the deterministic algorithm is similar. The only difference is that\nthere must exist one point-set Gi0 whose contribution to the total cost is no more than the average\ncost. Thus the constant in the right-hand side of (3) becomes 1 rather than 2, and consequently the\n\ufb01nal approximation ratio in (5) turns to 2. Note that the desired Gi0 can be found by enumerating\n(cid:117)(cid:116)\nall point-sets, and selecting the one having the smallest objective value.\nRemark 1. Comparing the two approximation algorithms, we can see a tradeoff between the approx-\nimation ratio and the running time. The randomized algorithm has a larger approximation ratio, but a\nlinear dependence on n in its running time. The deterministic algorithm has a smaller approximation\nratio, but a quadratic dependence on n.\nLocal improvement. After \ufb01nding a constant approximation, it is necessary to conduct some local\nimprovement. An easy-to-implement method is the follows. Let \u02dcQ = {\u02dcq1,\u00b7\u00b7\u00b7 , \u02dcqm} be the initial\nconstant approximation solution. Compute the bipartite matching between \u02dcQ and each Gi. This\nyields a chromatic partition { \u02dcU1,\u00b7\u00b7\u00b7 , \u02dcUm} on G, where each \u02dcUj consists of all the points matched\nto \u02dcqj. By De\ufb01nition 6, we know that qj should be the geometric median point of Uj in order to make\nthe objective value as low as possible. Thus, we can use the well known Weiszfelds algorithm [23] to\ncompute the geometric median point for each \u02dcUj, and update \u02dcqj to be the corresponding geometric\nmedian point. We can iteratively perform the following two steps, (1) computing the chromatic\npartition and (2) generating the geometric median points, until the objective value becomes stable.\n4 k-Prototype learning\nIn this section, we generalize the ideas for 1-prototype learning to k-prototype learning for some\nk > 1. As mentioned in Section 1, our idea is to build a correlation graph. We \ufb01rst introduce the\nfollowing lemma.\n\nLemma 2. The alignment cost in De\ufb01nition 3 satis\ufb01es the triangle inequality.\nCorrelation graph. We denote the correlation graph on the given m-rigid structures {P1,\u00b7\u00b7\u00b7 , Pn}\nas \u0393 , which contains n vertices {v1,\u00b7\u00b7\u00b7 , vn}. Each vi represents the rigid structure Pi, and the edge\nconnecting vi and vj has the weight equal to A(Pi, Pj). From Lemma 2, we know that \u0393 is a metric\ngraph. Thus, we have the following key theorem.\n\nTheorem 3. Any \u03bb-approximation solution for metric k-median clustering on \u0393 yields a 2\u03bb-\napproximation solution for the k-prototype learning problem on {P1,\u00b7\u00b7\u00b7 , Pn}, where \u03bb \u2265 1.\nProof. Let {Q1,\u00b7\u00b7\u00b7 , Qk} be the k rigid structures yielded in an optimal solution of the k-prototype\nlearning, and {C1,\u00b7\u00b7\u00b7 ,Ck} be the corresponding k optimal clusters. For each 1 \u2264 j \u2264 k, the cost of\n\nA(Pi, Qj). There exists one rigid structure Pij \u2208 Cj such that\n\nCj is(cid:80)\n\nPi\u2208Cj\n\nA(Pij , Qj) \u2264 1\n|Cj|\nIf we replace Qj by Pij , the cost of Cj becomes\n\n(cid:88)\n\nPi\u2208Cj\n\nA(Pi, Qj).\n\n(A(Pi, Qj) + A(Qj, Pij )) \u2264 2\n\n(cid:88)\n\nPi\u2208Cj\n\nA(Pi, Qj),\n\n(6)\n\n(7)\n\n(cid:88)\n\nPi\u2208Cj\n\nA(Pi, Pij ) \u2264 (cid:88)\n(cid:88)\n\nk(cid:88)\n\nPi\u2208Cj\n\nj=1\n\nPi\u2208Cj\n\nwhere the \ufb01rst inequality follows from the triangle inequality (by Lemma 2), and the second in-\nequality follows from (6). Then, (7) directly implies that\n\nA(Pi, Pij ) \u2264 2\n\nA(Pi, Qj),\n\n(8)\n\nk(cid:88)\n\n(cid:88)\n\nj=1\n\nPi\u2208Cj\n\n\f(8) is similar to the deterministic solution in Theorem 2; the only difference is that the point-sets\nhere need to be aligned through rigid transformation, while in Theorem 2, the point-sets are \ufb01xed.\nNow, consider the correlation graph \u0393 . If we select {vi1,\u00b7\u00b7\u00b7 , vik} as the k medians, the objective\nvalue of the k-median clustering is the same as the left-hand side of (8). Let {vi(cid:48)\n} be the k\nmedian vertices of the \u03bb-approximation solution on \u0393 . Then, we have\n\n,\u00b7\u00b7\u00b7 , vi(cid:48)\n\nk\n\n1\n\nn(cid:88)\n\ni=1\n\nn(cid:88)\n\ni=1\n\nA(Pi, Pi(cid:48)\n\nj\n\n) \u2264 \u03bb\n\nmin\n1\u2264j\u2264k\n\nA(Pi, Pij ) \u2264 2\u03bb\n\nmin\n1\u2264j\u2264k\n\nA(Pi, Qj),\n\nk(cid:88)\n\n(cid:88)\n\nj=1\n\nPi\u2208Cj\n\n(9)\n\n(cid:117)(cid:116)\n\nwhere the second inequality follows from (8). Thus the theorem is true.\nBased on Theorem 3, we have the following algorithm for k-prototype learning.\nAlgorithm: k-prototype learning\n\n1. Build the correlation graph \u0393 , and run the algorithm proposed in [9] to obtain a\n3-approximation for the metric k-median clustering on \u0393 , and consequently a 13 1\n3-\n6 2\napproximation for k-prototype learning.\n\n2. For each obtained cluster, run the 1-prototype learning algorithm presented in Section 3.\n\nRemark 2. Note that there are several algorithms for metric k-median clustering with better approx-\nimation ratio (than 6 2\n3), such as the ones in [19]. But they are all theoretical algorithms and have\ndif\ufb01cult to be applied in practice. We choose the linear programming rounding based algorithm by\nCharikar et al. [9] partially due to its simplicity to be implemented for practical purpose.\nThe exact correlation graph is not available. From the methods presented in Section 3.1, we\nknow that only approximate alignments can be obtained. This means that the exact correlation graph\n\u0393 is not available. As a consequence, the approximate correlation graph may not be metric (due\nto possible violation of the triangle inequality). This seems to cause the above algorithm to yield\nsolution with no quality guarantee. Fortunately, as pointed in [9], the LP-rounding method still\nyields a provably good approximation solution, as long as a weaker version of the triangle inequality\nis satis\ufb01ed (i.e., for any three vertices va, vb and vc in \u0393 , their edge weights satisfy the inequality\nw(vavb) \u2264 \u03b4(w(vavc) + w(vbvc)) for some constant \u03b4 > 1, where w(vavb) is the weight of the\nedge connecting va and vb).\nTheorem 4. For a given set of rigid structures, if a (1 + \u0001)-approximation of the alignment between\nany pair of rigid structures can be computed, then the algorithm for metric k-median clustering\nin [9] yields a 2( 23\nWhat if the rigid structures have unequal sizes? In some scenario, the rigid structures may not\nhave the same number of points, and consequently the one-to-one match between rigid structures in\nDe\ufb01nition 2 is not available. To resolve this issue, we can use the weight normalization strategy and\nadopt Earth Mover\u2019s Distance (EMD) [8]. Generally speaking, for any rigid structure Pi containing\nm(cid:48) points for some m(cid:48)\nm(cid:48) , and compute the\nalignment cost based on EMD, rather than the bipartite matching cost. With this modi\ufb01cation, both\nthe 1- and k-prototype learning algorithms still work.\n\n3 (1 + \u0001) \u2212 1)(1 + \u0001)-approximation for the k-prototype learning problem.\n\n(cid:54)= m, we assign each point with a weight equal to m\n\n5 Exepriments\nTo evaluate the performance of our proposed approach, we implement our algorithms on a Linux\nworkstation (with 2.4GHz CPU and 4GB memory). We consider two types of data, the sets of\nrandomly generated 3D rigid structures and a real biological data set which is used to determine the\norganization pattern (among a population of cells) of chromosome territories inside the cell nucleus.\nRandom data. For random data, we test a number of data sets with different size. For each data\nset, we \ufb01rst randomly generate k different rigid structures, {Q1,\u00b7\u00b7\u00b7 , Qk}. Then around each point\nof Qj, j = 1,\u00b7\u00b7\u00b7 , k, we generate a set of points following Gaussian distribution, with variance\n\u03b4. We randomly select one point from each of the m Gaussian distributions (around the m points\nof Qj) to form an m-rigid structure, and transform it by a random rigid transformation. Thus, we\nbuild a cluster (denoted by Cj) of m-rigid structures around each Qj, and Qj can be viewed as its\nj=1 Cj forms an instance of the k-prototype learning problem.\n\nprototype (i.e., the ground truth).(cid:83)k\n\n\f1,\u00b7\u00b7\u00b7 , Q(cid:48)\n\n1,\u00b7\u00b7\u00b7 , Q(cid:48)\n\n1,\u00b7\u00b7\u00b7 , Q(cid:48)\n\nk}, and for each pair Qi and Q(cid:48)\n\nthe sum t2 =(cid:80)k\n\nWe run the algorithm of k-prototype learning in Section 4, and denote the resulting k rigid structures\nby {Q(cid:48)\nk}. To evaluate the performance, we compute the following two values. Firstly, we\ncompute the bipartite matching cost, t1, between {Q1,\u00b7\u00b7\u00b7 , Qk} and {Q(cid:48)\nk}, i.e., build the\nbipartite graph between {Q1,\u00b7\u00b7\u00b7 , Qk} and {Q(cid:48)\nj, connect\nan edge with a weight equal to the alignment cost A(Qi, Q(cid:48)\nj). Secondly, we compute the average\nalignment cost (denoted by cj) between the rigid structures in Cj and Qj for 1 \u2264 j \u2264 k, and compute\nj=1 cj. Finally, we use the ratio t1/t2 to show the performance. The ratio indicates\nhow much cost (i.e., t1) has been reduced by our prototype learning algorithm, comparing to the\ncost (i.e., t2) of the input rigid structures. We choose k = 1, 2, 3, 4, 5; for each k, vary m from 10\n(cid:80)m\nto 20, and the size of each Cj from 100 to 300. Also, for each Cj, we vary the Gaussian variance\nfrom 10% to 30% of the average spread norm of Qj, where if we assume Qj contains m points\n{q1,\u00b7\u00b7\u00b7 , qm}, and o = 1\nl=1 ||ql \u2212 o||.\nFor each k, we generate 10 datasets, and plot the average experimental results in Figure 2(a). The\nexperiment suggests that our generated prototypes are much closer (at least 40% for each k) to the\nground truth than the input rigid structures.\n\nl=1 ql, then the average spread norm is de\ufb01ned as 1\n\n(cid:80)m\n\nm\n\nm\n\n(a)\n\n(b)\n\n(c)\n\nFig. 2: (a) Experimental results for random data; (b)A 2D slice of the 3D microscopic image of 8\npairs of chromosome territories; (c ) Average alignment cost for biological data set.\nBiological data. For real data, we use a biological data set consisting of 91 microscopic nucleus\nimages of WI-38 lung \ufb01broblasts cells. Each image includes 8 pairs of chromosome territories (see\nFig. 2(b)). The objective of this experiment is to determine whether there exists any spatial pattern\namong the population of cells governing the organization of the chromosomes inside the 3D cell\nnucleus so as to provide new evidence to resolve a longstanding conjecture in cell biology which says\nthat each chromosome territory has a preferable position inside the cell nucleus. For this purpose,\nwe calculate the gravity center of each chromosome territory and use it as the representative of\nthe chromosome. In this way, each cell is converted into a rigid structure of 16 points. Since there\nis no ground truth for the biological data, we directly use the average alignment cost between our\ngenerated solutions and the input rigid structures to evaluate the performance. We run our algorithms\nfor k = 1, 2, 3, 4, and plot the cost in Fig. 2(c ). Our preliminary experiments indicate that there is a\nsigni\ufb01cant reduction on the average cost from k = 1 to k = 2, and the cost does not change too much\nfor k = 2, 3, 4. We also analyze how chromosomes change their clusters when increase k from 2 to\n4}.\n2}, and the clusters for k = 4 as {C 4\n4. We denote the clusters for k = 2 as {C 2\n1 , C 2\n3 , C 4\n1 , C 4\n|C4\nj \u2229C2\n2|\nFor each 1 \u2264 j \u2264 4, we use\nto represent the preservation of C 4\n1 and\nj from C 2\n2|\n|C2\n1 and C 2\n2 . It\n2 respectively. The following table 1 shows the preservation (denoted by Pre) with C 2\nC 2\n3} preserved C 2\n1 well. This\nshows that C 4\n1 , C 4\nseems to suggest that all the cells are aggregated around two clusters.\n\n2 well, meanwhile, the union of {C 4\n\n4 preserved C 2\n\nj \u2229C2\n1|\n|C2\n1|\n\n2 , C 4\n\n2 , C 4\n\nand\n\n|C4\n\nTable 1: The preservations\n\nC 4\n2\n\nC 4\n3\n\nC 4\n4\n\nPre C 4\n1\nC 2\nC 2\n2\n\n1 26.53% 18.37% 46.94% 8.16%\n0% 5.56% 94.44%\n\n0%\n\n6 Conclusion\nIn this paper, we study a new prototype learning problem, called k-prototype learning, for 3D rigid\nstructures, and present a practical optimization model for it. As the base case, we consider the 1-\nprototype learning problem, and reduce it to the chromatic clustering problem. Then we extend\n1-prototype learning algorithm to k-prototype learning to achieve a quality guaranteed approximate\nsolution. Finally, we implement our algorithms on both random and biological data sets. Experi-\nments suggest that our algorithms can effectively learn prototypes from both types of data.\n\n012345600.20.40.60.81kt1/t20123450.060.0650.070.0750.080.0850.09kAverage alignment cost\fReferences\n\n[1] H. Alt and L. Guibas, Discrete geometric shapes: matching, interpolation, and approximation, in: J.-R. Sack,\nJ. Urrutia (Eds.), Handbook of Computational Geometry, Elsevier, Amsterdam, 1999, pp. 121-153.\n[2] K. S. Arun, T. S. Huang and S. D. Blostein: Least-Squares Fitting of Two 3-D Point Sets. IEEE Trans.\nPattern Anal. Mach. Intell. (PAMI) 9(5):698-700, 1987.\n[3] R. Berezney. Regulating the mammalian genome: the role of nuclear architecture. Advances in Enzyme\nRegulation, 42:39-52, 2002.\n[4] P.J. Besl and N.D. McKay, A method for registration of 3-d shapes, IEEE Trans. Pattern Anal. Mach. Intell.\n14 (2) 239-256, 1992.\n[5] S. Belongie, J. Malik and J. Puzicha. Shape Matching and Object Recognition Using Shape Contexts. IEEE\nTrans. Pattern Anal. Mach. Intell. 24(4): 509-522 (2002)\n[6] J. Croft, J. Bridger, S. Boyle, P. Perry, P. Teague and W. Bickmore. Differences in the localization and\nmorphology of chromosomes in the human nucleus. J. Cell. Biol., 145(6):1119-1131, 1999.\n[7] T. Cremer, M. Cremer, S. Dietzel, S. Mller, I. Solovei, and S. Fakan. Chromosome territoriesa functional\nnuclear landscape. Curr. Opin. Cell. Biol., 18(3):307-316, 2006.\n[8] S. D. Cohen and L. J. Guibas. The Earth Mover\u2019s Distance under Transformation Sets. In ICCV, 1076-1083,\n1999.\n[9] M. Charikar, S. Guha, E. Tardos and D. B. Shmoys. A constant-factor approximation algorithm for the\nk-median problem (extended abstract). In Proceedings of the thirtieth annual ACM symposium on Theory of\ncomputing, STOC \u201999, pages 1-10, New York, NY, USA, 1999.\n[10] H. Ding, B. Stojkovic, R. Berezney and J. Xu. Gauging Association Patterns of Chromosome Territories\nvia Chromatic Median. In CVPR 2013: 1296-1303\n[11] M. F. Demirci, A. Shokoufandeh and S. J. Dickinson. Skeletal Shape Abstraction from Examples. IEEE\nTrans. Pattern Anal. Mach. Intell. 31(5): 944-952, 2009.\n[12] H. Ding and J. Xu. Solving the Chromatic Cone Clustering Problem via Minimum Spanning Sphere. In\nICALP (1) 2011: 773-784\n[13] H. Ding and J. Xu. FPTAS for Minimizing Earth Mover\u2019s Distance under Rigid Transformations. In ESA\n2013: 397-408\n[14] M. Ferrer, D. Karatzas, E. Valveny, I. Bardaj and H. Bunke. A generic framework for median graph\ncomputation based on a recursive embedding approach. Computer Vision and Image Understanding 115(7):\n919-928, 2011.\n[15] J. C. Gower. Generalized procrustes analysis. Psychometrika 40: 3331, 1975.\n[16] R. I. Hartley and F. Kahl. Global Optimization through Rotation Space Search. International Journal of\nComputer Vision 82(1): 64-79 (2009)\n[17] T. Jiang, F. Jurie and C. Schmid. Learning shape prior models for object matching. In CVPR 2009: 848-855\n[18] X. Jiang, A. Munger and H. Bunke. On Median Graphs: Properties, Algorithms, and Applications, IEEE\nTPAMI, vol. 23, no. 10, pp. 1144-1151, Oct. 2001.\n[19] S. Li and O. Svensson: Approximating k-median via pseudo-approximation. In STOC 2013: 901-910.\n[20] D. Macrini, K. Siddiqi and S. J. Dickinson. From skeletons to bone graphs: Medial abstraction for object\nrecognition. In CVPR, 2008.\n[21] T. B. Sebastian, P. N. Klein and B. B. Kimia. Recognition of Shapes by Editing Shock Graphs. In ICCV,\n755-762, 2001.\n[22] N. H. Trinh and B. B. Kimia. Learning Prototypical Shapes for Object Categories. In SMiCV, 2010.\n[23] E. Weiszfeld. On the point for which the sum of the distances to n given points is minimum. Tohoku. Math.\nJournal., 43:355-386, 1937.\n[24] D. B. West, Introduction to Graph Theory , Prentice Hall, Chapter 3, ISBN 0-13-014400-2, 1999.\n[25] M.J. Zeitz, L. Mukherjee, J. Xu, S. Bhattacharya and R. Berezney. A Probabilistic Model for the Arrange-\nment of a Subset of Chromosome Territories in WI38 Human Fibroblasts. Journal of Cellular Physiology, 221,\n120-129, 2009.\n\n\f", "award": [], "sourceid": 1228, "authors": [{"given_name": "Hu", "family_name": "Ding", "institution": "SUNY at Buffalo"}, {"given_name": "Ronald", "family_name": "Berezney", "institution": "University of Buffalo"}, {"given_name": "Jinhui", "family_name": "Xu", "institution": "SUNY at Buffalo"}]}