Part of Advances in Neural Information Processing Systems 22 (NIPS 2009)

*Menachem Fromer, Amir Globerson*

We consider the problem of finding the M assignments with maximum probability in a probabilistic graphical model. We show how this problem can be formulated as a linear program (LP) on a particular polytope. We prove that, for tree graphs (and junction trees in general), this polytope has a particularly simple form and differs from the marginal polytope in a single inequality constraint. We use this characterization to provide an approximation scheme for non-tree graphs, by using the set of spanning trees over such graphs. The method we present puts the M -best inference problem in the context of LP relaxations, which have recently received considerable attention and have proven useful in solving difficult inference problems. We show empirically that our method often finds the provably exact M best configurations for problems of high tree-width. A common task in probabilistic modeling is finding the assignment with maximum probability given a model. This is often referred to as the MAP (maximum a-posteriori) problem. Of particular interest is the case of MAP in graphical models, i.e., models where the probability factors into a product over small subsets of variables. For general models, this is an NP-hard problem [11], and thus approximation algorithms are required. Of those, the class of LP based relaxations has recently received considerable attention [3, 5, 18]. In fact, it has been shown that some problems (e.g., fixed backbone protein design) can be solved exactly via sequences of increasingly tighter LP relaxations [13]. In many applications, one is interested not only in the MAP assignment but also in the M maximum probability assignments [19]. For example, in a protein design problem, we might be interested in the M amino acid sequences that are most stable on a given backbone structure [2]. In cases where the MAP problem is tractable, one can devise tractable algorithms for the M best problem [8, 19]. Specifically, for low tree-width graphs, this can be done via a variant of max-product [19]. However, when finding MAPs is not tractable, it is much less clear how to approximate the M best case. One possible approach is to use loopy max-product to obtain approximate max-marginals and use those to approximate the M best solutions [19]. However, this is largely a heuristic and does not provide any guarantees in terms of optimality certificates or bounds on the optimal values. LP approximations to MAP do enjoy such guarantees. Specifically, they provide upper bounds on the MAP value and optimality certificates. Furthermore, they often work for graphs with large tree-width [13]. The goal of the current work is to leverage the power of LP relaxations to the M best case. We begin by focusing on the problem of finding the second best solution. We show how it can be formulated as an LP over a polytope we call the "assignment-excluding marginal polytope". In the general case, this polytope may require an exponential number of inequalities, but we prove that when the graph is a tree it has a very compact representation. We proceed to use this result to obtain approximations to the second best problem, and show how these can be tightened in various ways. Next, we show how M best assignments can be found by relying on algorithms for 1

second best assignments, and thus our results for the second best case can be used to devise an approximation algorithm for the M best problem. We conclude by applying our method to several models, showing that it often finds the exact M best assignments.

1

The M-best MAP problem and its LP formulation i (xi ) iV

Consider a function on n variables defined as: f (x1 , . . . , xn ; ) = ij (xi , xj ) + ijE

(1)

where V and E are the vertices and nodes of a graph G with n nodes. We shall be interested in the M assignments with largest f (x; ) value.1 Denote these by x(1) , . . . , x(M) , so that x(1) is the assignment that maximizes f (x; ), x(2) is the 2nd best assignment, etc. The MAP problem (i.e., finding x(1) ) can be formulated as an LP as follows [15]. Let be a vector of distributions that includes {ij (xi , xj )}ijE over edge variables and {i (xi )}iV over nodes. The set of that arise from some joint distribution is known as the marginal polytope [15] and is denoted by M(G). Formally: M(G) = { | p(x) s.t. p(xi , xj ) = ij (xi , xj ) , p(xi ) = i (xi )} . where is the set of distributions on x. The MAP problem can then be shown to be equivalent to the following LP:2 max f (x; ) = max , (2) x M(G)

It can be shown that this LP always has a maximizing that is a vertex of M(G) and is integral. Furthermore, this corresponds to the MAP assignment x(1) . Although the number of variables in this LP is only O(|E| + |V |), the difficulty comes from an exponential number of linear inequalities generally required to describe the marginal polytope M(G). We shall find it useful to define a mapping between assignments x and integral vertices of the polytope. Given an integral vertex v M(G), define x(v) to be the assignment that maximizes vi (xi ). And, given an assignment z define v(z) to be the integral vertex in M(G) corresponding to the assignment z. Thus the LP in Eq. 2 will be maximized by v(x(1) ). One simple outer bound of the marginal polytope is the local polytope ML (G), which only enforces pairwise constraints between variables: i (xi ) = 1 (3) ij (xi , xj ) = j (xj ), ij (xi , xj ) = i (xi ), ML (G) = 0 x x x j i i

The LP relaxation is then to maximize where ML (G). For tree structured graphs, ML (G) = M(G) [15] and thus the LP relaxation yields the exact MAP x(1) .

2

An LP Formulation for the 2nd -best MAP

Assume we found the MAP assignment x(1) and are now interested in finding x(2) . Is there a simple LP whose solution yields x(2) ? We begin by focusing on the case where G is a tree so that the local LP relaxation is exact. We first treat the case of a connected tree. To construct an LP whose solution is x(2) , a natural approach is to use the LP for x(1) (i.e., the LP in Eq. 2) but somehow eliminate the solution x(1) using additional constraints. This, however, is somewhat trickier than it sounds. The key difficulty is that the new constraints should not generate fractional vertices, so that the resulting LP is still exact. We begin by defining the polytope over which we need to optimize in order to obtain x(2) . 1 2

This is equivalent to finding P maximum probability assignments for a model p(x) ef (x;) . the P P P We use the notation = ijE xi ,xj ij (xi , xj )ij (xi , xj ) + i xi i (xi )i (xi )

2

Definition 1. The assignment-excluding marginal polytope is defined as: ^ M(G, z) = { | p(x) s.t. p(z) = 0, p(xi , xj ) = ij (xi , xj ), p(xi ) = i (xi )} . ^ M(G, z) is simply the convex hull of all (integral) vectors v(x) for x = z.

(4)

^ The following result shows that optimizing over M(G, x(1) ) will yield the second best solu^ tion x(2) , so that we refer to M(G, x(1) ) as the second-best marginal polytope. Lemma 1. The 2nd best solution is obtained via the following LP: maxx=x(1) f (x; ) = maxM(G,x(1) ) . Furthermore, the that maximizes the LP on ^ the right is integral and corresponds to the second-best MAP assignment x(2) . The proof is similar to that of Eq. 2: instead of optimizing over x, we optimize over distributions p(x), while enforcing that p(x(1) ) = 0 so that x(1) is excluded from the maximization. The key question which we now address is how to obtain a simple characterization of ^ ^ M(G, z). Intuitively, it would seems that M(G, z) should be "similar" to M(G), such that it can be described as M(G) plus some constraints that "block" the assignment z. To illustrate the difficulty in finding such "blocking" constraints, consider the following constraint, originally suggested by Santos [10]: i i (zi ) n - 1. This inequality is not satisfied by = v(z) since v(z) attains the value n for the LHS of the above. Furthermore, for any x = z and = v(x), the LHS would be n - 1 or less. Thus, this inequality separates ^ v(z) from all other integral vertices. One might conclude that we can define M(G, z) by adding this inequality to M(G). The difficulty is that the resulting polytope has fractional vertices,3 and maximizing over it won't generally yield an integral solution. It turns out that there is a different inequality that does yield an exact characterization of ^ M(G, z) when G is a tree. We now define this inequality and state our main theorem. Definition 2. Consider the functional I(, z) (which is linear in ): I(, z) = i

(1 - di )i (zi ) + ijE

Do not remove: This comment is monitored to verify that the site is working properly