{"title": "Transportability from Multiple Environments with Limited Experiments", "book": "Advances in Neural Information Processing Systems", "page_first": 136, "page_last": 144, "abstract": "This paper considers the problem of transferring experimental findings learned from multiple heterogeneous domains to a target environment, in which only limited experiments can be performed. We reduce questions of transportability from multiple domains and with limited scope to symbolic derivations in the do-calculus, thus extending the treatment of transportability from full experiments introduced in Pearl and Bareinboim (2011). We further provide different graphical and algorithmic conditions for computing the transport formula for this setting, that is, a way of fusing the observational and experimental information scattered throughout different domains to synthesize a consistent estimate of the desired effects.", "full_text": "Transportability from Multiple Environments\n\nwith Limited Experiments\n\nElias Bareinboim\u2217\n\nUCLA\n\nSanghack Lee\u2217\n\nPenn State University\n\nVasant Honavar\n\nPenn State University\n\nJudea Pearl\n\nUCLA\n\nAbstract\n\nThis paper considers the problem of transferring experimental \ufb01ndings learned\nfrom multiple heterogeneous domains to a target domain, in which only limited\nexperiments can be performed. We reduce questions of transportability from mul-\ntiple domains and with limited scope to symbolic derivations in the causal calcu-\nlus, thus extending the original setting of transportability introduced in [1], which\nassumes only one domain with full experimental information available. We further\nprovide different graphical and algorithmic conditions for computing the transport\nformula in this setting, that is, a way of fusing the observational and experimen-\ntal information scattered throughout different domains to synthesize a consistent\nestimate of the desired effects in the target domain. We also consider the issue of\nminimizing the variance of the produced estimand in order to increase power.\n\n1 Motivation\n\nTransporting and synthesizing experimental knowledge from heterogeneous settings are central to\nscienti\ufb01c discovery. Conclusions that are obtained in a laboratory setting are transported and applied\nelsewhere in an environment that differs in many aspects from that of the laboratory. In data-driven\nsciences, experiments are conducted on disparate domains, but the intention is almost invariably to\nfuse the acquired knowledge, and translate it into some meaningful claim about a target domain,\nwhich is usually different than any of the individual study domains.\nHowever, the conditions under which this extrapolation can be legitimized have not been formally\narticulated until very recently. Although the problem has been discussed in many areas of statistics,\neconomics, and the health sciences, under rubrics such as \u201cexternal validity\u201d [2, 3], \u201cmeta-analysis\u201d\n[4], \u201cquasi-experiments\u201d [5], \u201cheterogeneity\u201d [6], these discussions are limited to verbal narratives\nin the form of heuristic guidelines for experimental researchers \u2013 no formal treatment of the prob-\nlem has been attempted to answer the practical challenge of generalizing causal knowledge across\nmultiple heterogeneous domains with disparate experimental data posed in this paper.\nThe \ufb01elds of arti\ufb01cial intelligence and statistics provide the theoretical underpinnings necessary for\ntackling transportability. First, the distinction between statistical and causal knowledge has received\nsyntactic representation through causal diagrams [7, 8, 9], which became a popular tool for causal\ninference in data-driven \ufb01elds. Second, the inferential machinery provided by the causal calculus\n(do-calculus) [7, 9, 10] is particularly suitable for handling knowledge transfer across domains.\nArmed with these techniques, [1] introduced a formal language for encoding differences and com-\nmonalities between domains accompanied with necessary or suf\ufb01cient conditions under which trans-\nportability of empirical \ufb01ndings is feasible between two domains, a source and a target; then, these\nconditions were extended for a complete characterization for transportability in one domain with un-\nrestricted experimental data [11]. Subsequently, these results were generalized for the settings when\n\n\u2217These authors contributed equally to this paper.\n\nThe authors\u2019 addresses are respectively\n\neb@cs.ucla.edu, sxl439@ist.psu.edu, vhonavar@ist.psu.edu, judea@cs.ucla.edu.\n\n1\n\n\fonly limited experiments are available in the source domain [12, 13], and further for when multiple\nsource domains with unrestricted experimental information are available [14, 15]. This paper broad-\nens these discussions introducing a more general setting in which multiple heterogeneous sources\nwith limited and distinct experiments are available, a task that we call here \u201cmz-transportability\u201d.1\nMore formally, the mz-transportability problem concerns with the transfer of causal knowledge\nfrom a heterogeneous collection of source domains \u03a0 = {\u03c01, ..., \u03c0n} to a target domain \u03c0\u2217. In each\ndomain \u03c0i \u2208 \u03a0, experiments over a set of variables Zi can be performed, and causal knowledge\ngathered. In \u03c0\u2217, potentially different from \u03c0i, only passive observations can be collected (this con-\nstraint is weakened later on). The problem is to infer a causal relationship R in \u03c0\u2217 using knowledge\nobtained in \u03a0. Clearly, if nothing is known about the relationship between \u03a0 and \u03c0\u2217, the problem is\ntrivial; no transfer can be justi\ufb01ed. Yet the fact that all scienti\ufb01c experiments are conducted with the\nintent of being used elsewhere (e.g., outside the lab) implies that scienti\ufb01c progress relies on the as-\nsumption that certain domains share common characteristics and that, owed to these commonalities,\ncausal claims would be valid in new settings even where experiments cannot be conducted.\nThe problem stated in this paper generalizes the one-dimensional version of transportability with\nlimited scope and the multiple dimensional with unlimited scope. Remarkably, while the effects of\ninterest might not be individually transportable to the target domain from the experiments in any of\nthe available sources, combining different pieces from the various sources may enable the estimation\nof the desired effects (to be shown later on). The goal of this paper is to formally understand under\nwhich conditions the target quantity is (non-parametrically) estimable from the available data.\n2 Previous work and our contributions\n\nConsider Fig. 1(a) in which the node S represents factors that produce differences between source\nand target populations. Assume that we conduct a randomized trial in Los Angeles (LA) and es-\ntimate the causal effect of treatment X on outcome Y for every age group Z = z, denoted by\nP (y|do(x), z). We now wish to generalize the results to the population of the United States (U.S.),\nbut we \ufb01nd the distribution P (x, y, z) in LA to be different from the one in the U.S. (call the latter\nP \u2217(x, y, z)). In particular, the average age in the U.S. is signi\ufb01cantly higher than that in LA. How\nare we to estimate the causal effect of X on Y in U.S., denoted R = P \u2217(y|do(x))? 2 3\nThe selection diagram for this example (Fig. 1(a)) conveys the assumption that the only difference\nbetween the two populations are factors determining age distributions, shown as S \u2192 Z, while age-\nspeci\ufb01c effects P \u2217(y|do(x), Z = z) are invariant across populations. Difference-generating factors\nare represented by a special set of variables called selection variables S (or simply S-variables),\nwhich are graphically depicted as square nodes ((cid:4)). From this assumption, the overall causal effect\nin the U.S. can be derived as follows:\n\nR = (cid:88)\n= (cid:88)\n\nz\n\nP \u2217(y|do(x), z)P \u2217(z)\n\nP (y|do(x), z)P \u2217(z)\n\n(1)\n\nz\n\nThe last line is the transport formula for R.\nIt combines experimental results obtained in LA,\nP (y|do(x), z), with observational aspects of the U.S. population, P \u2217(z), to obtain an experimental\nclaim P \u2217(y|do(x)) about the U.S.. In this trivial example, the transport formula amounts to a simple\nre-calibration (or re-weighting) of the age-speci\ufb01c effects to account for the new age distribution.\nIn general, however, a more involved mixture of experimental and observational \ufb01ndings would\nbe necessary to obtain a bias-free estimate of the target relation R. Fig. 1(b) depicts the smallest\nexample in which transportability is not feasible even when experiments over X in \u03c0 are available.\nIn real world applications, it may happen that certain controlled experiments cannot be conducted\nin the source environment (for \ufb01nancial, ethical, or technical reasons), so only a limited amount\n\n1The machine learning literature has been concerned about discrepancies among domains in the context,\nalmost exclusively, on predictive or classi\ufb01cation tasks as opposed to learning causal or counterfactual measures\n[16, 17]. Interestingly enough, recent work on anticausal learning moves towards more general modalities of\nlearning and also leverages knowledge about the underlying data-generating structure [18, 19].\n\n2We will use Px(y | z) interchangeably with P (y | do(x), z).\n3We use the structural interpretation of causal diagrams as described in [9, pp. 205].\n\n2\n\n\fFigure 1: The selection variables S are depicted as square nodes ((cid:4)). (a) Selection diagram illustrat-\ning when transportability between two domains is trivially solved through simple recalibration. (b)\nThe smallest possible selection diagram in which a causal relation is not transportable. (c) Selection\ndiagram illustrating transportability when only experiments over {Z1} are available in the source.\n\nof experimental information can be gathered. A natural question arises whether the investigator in\npossession of a limited set of experiments would still be able to estimate the desired effects at the\ntarget domain. For instance, we assume in Fig. 1(c) that experiments over Z1 are available and the\ntarget quantity is R = P \u2217(y|do(x)), which can be shown to be equivalent to P (y|x, do(Z1)), the\nconditional distribution of Y given X in the experimental study when Z1 is randomized. 4\nOne might surmise that multiple pairwise z-transportability would be suf\ufb01cient to solve the mz-\ntransportability problem, but this is not the case. To witness, consider Fig. 2(a,b) which concerns\nthe transport of experimental results from two sources ({\u03c0a, \u03c0b}) to infer the effect of X on Y\nin \u03c0\u2217, R = P \u2217(y|do(x)).\nIn these diagrams, X may represent the treatment (e.g., cholesterol\nlevel), Z1 represents a pre-treatment variable (e.g., diet), Z2 represents an intermediate variable\n(e.g., biomarker), and Y represents the outcome (e.g., heart failure). We assume that experimental\nstudies randomizing {Z1, Z2} can be conducted in both domains. A simple analysis based on [12]\ncan show that R cannot be z-transported from either source alone, but it turns out that combining in\na special way experiments from both sources allows one to determine the effect in the target.\nMore interestingly, we consider the more stringent scenario where only certain experiments can\nbe performed in each of the domains. For instance, assume that it is only possible to conduct\nexperiments over {Z2} in \u03c0a and over {Z1} in \u03c0b. Obviously, R will not be z-transported in-\ndividually from these domains, but it turns out that taking both sets of experiments into account,\nz2 P (a)(y|do(z2))P (b)(z2|x, do(Z1)), which fully uses all pieces of experimental data avail-\nable. In other words, we were able to decompose R into subrelations such that each one is separately\nz-transportable from the source domains, and so is the desired target quantity. Interestingly, it is the\ncase in this example that if the domains in which experiments were conducted were reversed (i.e.,\n{Z1} randomized in \u03c0a, {Z2} in \u03c0b), it will not be possible to transport R by any method \u2013 the\ntarget relation is simply not computable from the available data (formally shown later on).\nThis illustrates some of the subtle issues mz-transportability entails, which cannot be immediately\ncast in terms of previous instances of the transportability class. In the sequel, we try to better under-\nstand some of these issues, and we develop suf\ufb01cient or (speci\ufb01c) necessary conditions for deciding\nspecial transportability for arbitrary collection of selection diagrams and set of experiments. We fur-\nther construct an algorithm for deciding mz-transportability of joint causal effects and returning the\ncorrect transport formula whenever this is possible. We also consider issues relative to the variance\nof the estimand aiming for improving sample ef\ufb01ciency and increasing statistical power.\n\nR =(cid:80)\n\n3 Graphical conditions for mz-transportability\n\nThe basic semantical framework in our analysis rests on structural causal models as de\ufb01ned in [9,\npp. 205], also called data-generating models. In the structural causal framework [9, Ch. 7], actions\nare modi\ufb01cations of functional relationships, and each action do(x) on a causal model M produces\n\n4A typical example is whether we can estimate the effect of cholesterol (X) on heart failure (Y ) by experi-\n\nments on diet (Z1) given that cholesterol levels cannot be randomized [20].\n\n3\n\nSYXZ2SZ1YXSYXS(a)(c)(b)Z\fFigure 2: Selection diagrams illustrating impossibility of estimating R = P \u2217(y|do(x)) through\nindividual transportability from \u03c0a and \u03c0b even when Z = {Z1, Z2} (for (a, b), (c, d))).\nIf we\nassume, more stringently, availability of experiments Za = {Z2}, Zb = {Z1}, Z\u2217 = {}, a more\nelaborated analysis can show that R can be estimated combining different pieces from both domains.\n\na new model Mx = (cid:104)U, V, Fx, P (U)(cid:105), where Fx is obtained after replacing fX \u2208 F for every\nX \u2208 X with a new function that outputs a constant value x given by do(x). 5\nWe follow the conventions given in [9]. We denote variables by capital letters and their realized\nvalues by small letters. Similarly, sets of variables will be denoted by bold capital letters, sets of\nrealized values by bold small letters. We use the typical graph-theoretic terminology with the corre-\nsponding abbreviations P a(Y)G and An(Y)G, which will denote respectively the set of observable\nparents and ancestors of the node set Y in G. A graph GY will denote the induced subgraph G con-\ntaining nodes in Y and all arrows between such nodes. Finally, GXZ stands for the edge subgraph\nof G where all incoming arrows into X and all outgoing arrows from Z are removed.\nKey to the analysis of transportability is the notion of \u201cidenti\ufb01ability,\u201d de\ufb01ned below, which ex-\npresses the requirement that causal effects are computable from a combination of data P and as-\nsumptions embodied in a causal graph G.\nDe\ufb01nition 1 (Causal Effects Identi\ufb01ability (Pearl, 2000, pp. 77)). The causal effect of an action\ndo(x) on a set of variables Y such that Y \u2229 X = \u2205 is said to be identi\ufb01able from P in G if Px(y)\nis uniquely computable from P (V) in any model that induces G.\n\nCausal models and their induced graphs are usually associated with one particular domain (also\ncalled setting, study, population, or environment). In ordinary transportability, this representation\nwas extended to capture properties of two domains simultaneously. This is possible if we assume\nthat the structural equations share the same set of arguments, though the functional forms of the\nequations may vary arbitrarily [11]. 6\nDe\ufb01nition 2 (Selection Diagram). Let (cid:104)M, M\u2217(cid:105) be a pair of structural causal models [9, pp. 205]\nrelative to domains (cid:104)\u03c0, \u03c0\u2217(cid:105), sharing a causal diagram G. (cid:104)M, M\u2217(cid:105) is said to induce a selection\ndiagram D if D is constructed as follows:\n\n1. Every edge in G is also an edge in D;\n2. D contains an extra edge Si \u2192 Vi whenever there might exist a discrepancy fi (cid:54)= f\u2217\n\ni or\n\nP (Ui) (cid:54)= P \u2217(Ui) between M and M\u2217.\n\nIn words, the S-variables locate the mechanisms where structural discrepancies between the two\ndomains are suspected to take place.7 Alternatively, the absence of a selection node pointing to\na variable represents the assumption that the mechanism responsible for assigning value to that\nvariable is identical in both domains.\n\n5The results presented here are also valid in other formalisms for causality based on potential outcomes.\n6As discussed in the reference, the assumption of no structural changes between domains can be relaxed,\n\nbut some structural assumptions regarding the discrepancies between domains must still hold.\n\n7Transportability assumes that enough structural knowledge about both domains is known in order to sub-\nstantiate the production of their respective causal diagrams. In the absence of such knowledge, causal discovery\nalgorithms might be used to infer the diagrams from data [8, 9].\n\n4\n\n2322311211ZZZZZZ(a)ZZYX(b)ZY(c)Z(d)YXXUXYWWWU\fArmed with the concept of identi\ufb01ability and selection diagrams, mz-transportability of causal ef-\nfects can be de\ufb01ned as follows:\nDe\ufb01nition 3 (mz-Transportability). Let D = {D(1), ..., D(n)} be a collection of selection diagrams\nrelative to source domains \u03a0 = {\u03c01, ..., \u03c0n}, and target domain \u03c0\u2217, respectively, and Zi (and Z\u2217)\nbe the variables in which experiments can be conducted in domain \u03c0i (and \u03c0\u2217). Let (cid:104)P i, I i\nz(cid:105) be\nP i(v|do(z(cid:48))),\nthe pair of observational and interventional distributions of \u03c0i, where I i\nz(cid:105) be the observational and interventional distributions of \u03c0\u2217.\nand in an analogous manner, (cid:104)P \u2217, I\u2217\nx(y|w) is\nThe causal effect R = P \u2217\n\nx(y|w) is said to be mz-transportable from \u03a0 to \u03c0\u2217 in D if P \u2217\n\nz =(cid:83)\n\nuniquely computable from(cid:83)\n\nz(cid:105) in any model that induces D.\n\nz(cid:105) \u222a (cid:104)P \u2217, I\u2217\n\ni=1,...,n(cid:104)P i, I i\n\nZ(cid:48)\u2286Zi\n\nz(cid:105) and (cid:104)P i, I i\n\nThe requirement that R is uniquely computable from (cid:104)P \u2217, I\u2217\nz(cid:105) from all sources has a\nsyntactic image in the causal calculus, which is captured by the following suf\ufb01cient condition.\nTheorem 1. Let D = {D(1), ..., D(n)} be a collection of selection diagrams relative to source\ndomains \u03a0 = {\u03c01, ..., \u03c0n}, and target domain \u03c0\u2217, respectively, and Si represents the collection of\nS-variables in the selection diagram D(i). Let {(cid:104)P i, I i\nz(cid:105) be respectively the pairs of\nobservational and interventional distributions in the sources \u03a0 and target \u03c0\u2217. The relation R =\nP \u2217(y|do(x), w) is mz-transportable from \u03a0 to \u03c0\u2217 in D if the expression P (y|do(x), w, S1, ..., Sn)\nis reducible, using the rules of the causal calculus, to an expression in which (1) do-operators that\napply to subsets of I i\n\nz have no Si-variables or (2) do-operators apply only to subsets of I\u2217\nz .\n\nz(cid:105)} and (cid:104)P \u2217, I\u2217\n\nThis result provides a powerful way to syntactically establish mz-transportability, but it is not im-\nmediately obvious whether a sequence of applications of the rules of causal calculus that achieves\nthe reduction required by the theorem exists, and even if such sequence exists, it is not obvious how\nto obtain it. For concreteness, we illustrate this result using the selection diagrams in Fig. 2(a,b).\nCorollary 1. P \u2217(y|do(x)) is mz-transportable in Fig. 2(a,b) with Za = {Z2} and Zb = {Z1}.\nProof. The goal is to show that R = P \u2217(y|do(x)) is mz-transportable from {\u03c0a, \u03c0b} to \u03c0\u2217 using\nexperiments conducted over {Z2} in \u03c0a and {Z1} in \u03c0b. Note that naively trying to transport R\nfrom each of the domains individually is not possible, but R can be decomposed as follows:\n\nP \u2217(y|do(x)) = P \u2217(y|do(x), do(Z1))\n\n= (cid:88)\n= (cid:88)\n\nz2\n\nz2\n\nP \u2217(y|do(x), do(Z1), z2)P \u2217(z2|do(x), do(Z1))\n\nP \u2217(y|do(x), do(Z1), do(z2))P \u2217(z2|do(x), do(Z1)),\n\n(3), and Eq.\n\n, where D is the diagram in \u03c0\u2217 (despite the location of the S-nodes).\n\nP \u2217(y|do(x), do(Z1), do(z2)) = P (y|do(x), do(Z1), do(z2), Sa, Sb)\n\nholds, we con-\n(4) follows by rule 2 of the causal calculus since (Z2 \u22a5\u22a5\n\nwhere Eq. (2) follows by rule 3 of the causal calculus since (Z1 \u22a5\u22a5 Y |X)DX,Z1\ndition on Z2 in Eq.\nY |X, Z1)DX,Z1 ,Z2\nNow we can rewrite the \ufb01rst term of Eq. (4) as indicated by the Theorem (and suggested by Def. 2):\n(5)\n= P (y|do(x), do(Z1), do(z2), Sb)\n(6)\n= P (y|do(z2), Sb)\n(7)\n= P (a)(y|do(z2)),\n(8)\nwhere Eq. (5) follows from the theorem (and the de\ufb01nition of selection diagram), Eq. (6) follows\nfrom rule 1 of the causal calculus since (Sa \u22a5\u22a5 Y |Z1, Z2, X)D\n, Eq. (7) follows from rule\n3 of the causal calculus since (Z1, X \u22a5\u22a5 Y |Z2)D\n. Note that this equation matches with the\nsyntactic goal of Theorem 1 since we have precisely do(z2) separated from Sa (and Z2 \u2208 I a\nz ); so,\nwe can rewrite the expression which results in Eq. (8) by the de\ufb01nition of selection diagram.\nFinally, we can rewrite the second term of Eq. (4) as follows:\n\n(a)\nZ1,Z2 ,X\n\n(a)\nZ1 ,Z2 ,X\n\nP \u2217(z2|do(x), do(Z1)) = P (z2|do(x), do(Z1), Sa, Sb)\n\n= P (z2|do(x), do(Z1), Sa)\n= P (z2|x, do(Z1), Sa)\n= P (b)(z2|x, do(Z1)),\n\n5\n\n(2)\n(3)\n\n(4)\n\n(9)\n(10)\n(11)\n(12)\n\n\f(b)\nZ1X\n\n(b)\nZ1,X\n\ni to Y are blocked by X, (Si \u22a5\u22a5 Y|X, Z(cid:48)\n\nwhere Eq. (9) follows from the theorem (and the de\ufb01nition of selection diagram), Eq. (10) follows\nfrom rule 1 of the causal calculus since (Sb \u22a5\u22a5 Z2|Z1, X)D\n, Eq. (11) follows from rule 2 of\nthe causal calculus since (X \u22a5\u22a5 Z2|Z1)D\n. Note that this equation matches the condition of the\ntheorem, separate do(Z1) from Sb (i.e., experiments over Z1 can be used since they are available in\n\u03c0b), so we can rewrite Eq. (12) using the de\ufb01nition of selection diagrams, the corollary follows.\nThe next condition for mz-transportability is more visible than Theorem 1 (albeit weaker), which\nalso demonstrates the challenge of relating mz-transportability to other types of transportability.\ni \u2286 Zi such that all paths\nCorollary 2. R = P \u2217(y|do(x)) is mz-transportable in D if there exists Z(cid:48)\nfrom Z(cid:48)\nRemarkably, randomizing Z2 when applying Corollary 1 was instrumental to yield transportability\nin the previous example, despite the fact that the directed paths from Z2 to Y were not blocked by X,\nwhich suggests how different this transportability is from z-identi\ufb01ability. So, it is not immediately\nobvious how to combine the topological relations of Zi\u2019s with X and Y in order to create a general\ncondition for mz-transportability, the relationship between the distributions in the different domains\ncan get relatively intricate, but we defer this discussion for now and consider a simpler case.\nIt is not usually trivial to pursue a derivation of mz-transportability in causal calculus, and next we\nshow an example in which such derivation does not even exist. Consider again the diagrams in Fig.\n2(a,b), and assume that randomized experiments are available over {Z1} in \u03c0a and {Z2} in \u03c0b.\nTheorem 2. P \u2217(y|do(x)) is not mz-transportable in Fig. 2(a,b) with Za = {Z1} and Zb = {Z2}.\nProof. Formally, we need to display two models M1, M2 such that the following relations hold (as\n\n, and R is computable from do(Zi).\n\ni)D\n\n(i)\nX,Z(cid:48)\n\ni\n\nimplied by Def. 3):\uf8f1\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f2\uf8f4\uf8f4\uf8f4\uf8f4\uf8f4\uf8f3\n\nP (a)\nM1 (Z1, X, Z2, Y ) = P (a)\nM1(Z1, X, Z2, Y ) = P (b)\nP (b)\nM1 (X, Z2, Y |do(Z1)) = P (a)\nP (a)\nM1(Z1, X, Y |do(Z2)) = P (b)\nP (b)\nP \u2217\nM1(Z1, X, Z2, Y ) = P \u2217\n\nM2 (Z1, X, Z2, Y ),\nM2(Z1, X, Z2, Y ),\n\nM2 (X, Z2, Y |do(Z1)),\nM2(Z1, X, Y |do(Z2)),\n\nM2(Z1, X, Z2, Y ),\n\nfor all values of Z1, X, Z2, and Y , and also,\n\nM1(Y |do(X)) (cid:54)= P \u2217\nP \u2217\n\nM2(Y |do(X)),\n\n(13)\n\n(14)\n\nfor some value of X and Y .\nLet V be the set of observable variables and U be the set of unobservable variables in D. Let us\nassume that all variables in U \u222a V are binary. Let U1, U2 \u2208 U be the common causes of Z1 and\nX and Z2, respectively; let U3, U4, U5 \u2208 U be a random disturbance exclusive to Z1, Z2, and Y ,\nrespectively, and U6 \u2208 U be an extra random disturbance exclusive to Z2, and U7, U8 \u2208 U to Y . Let\nSa and Sb index the model in the following way: the tuples (cid:104)Sa = 1, Sb = 0(cid:105), (cid:104)Sa = 0, Sb = 1(cid:105),\n(cid:104)Sa = 0, Sb = 0(cid:105) represent domains \u03c0a, \u03c0b, and \u03c0\u2217, respectively. De\ufb01ne the two models as follows:\n\nZ1 = U1 \u2295 U2 \u2295(cid:0)U3 \u2227 Sa\n\n(cid:1)\n\n\uf8f1\uf8f4\uf8f4\uf8f2\uf8f4\uf8f4\uf8f3\n\n(cid:1)\nZ1 = U1 \u2295 U2 \u2295(cid:0)U3 \u2227 Sa\n(cid:1) \u2295 U6\nZ2 =(cid:0)U4 \u2227 Sa \u2227 U6\n\nX = U1\nY = (Z2 \u2227 U5) \u2295 (U5 \u2227 U7) \u2295 (Sb \u2227 U8)\n\nM1 =\n\nX = Z1 \u2295 U1\nZ2 = (X \u2295 U2 \u2295 (U4 \u2227 Sa)) \u2228 U6\nY = (Z2 \u2227 U5) \u2295 (U5 \u2227 U7) \u2295 (Sb \u2227 U8)\n\nM2 =\n\nwhere \u2295 represents the exclusive or function. Both models agree in respect to P (U), which is\nde\ufb01ned as P (Ui) = 1/2, i = 1, ..., 8. It is not dif\ufb01cult to evaluate these models and note that the\nconstraints given in Eqs. (13) and (14) are satis\ufb01ed (including positivity), the theorem follows.\n\n\uf8f1\uf8f4\uf8f4\uf8f2\uf8f4\uf8f4\uf8f3\n\n4 Algorithm for computing mz-transportability\n\nIn this section, we build on previous analyses of identi\ufb01ability [7, 21, 22, 23] in order to obtain a\nmechanic procedure in which a collection of selection diagrams and experimental data is inputted,\nand the procedure returns a transport formula whenever it is able to produce one. More speci\ufb01cally,\n\n6\n\n\f.]\n\nZi\n\nZ, P (i)\nZi\n\nV\\An(Y)D\n\nV\\Y P.\n\nor F AIL(D, C0).\n\nP,I,S,W, DAn(Y)).\n\nx (y) in terms of P \u2217, P \u2217\n\nPROCEDURE TRmz(y, x,P,I,S,W, D)\nINPUT: x, y: value assignments; P: local distribution relative to domain S (S = 0 indexes \u03c0\u2217) and active\nexperiments I; W: weighting scheme; D: backbone of selection diagram; Si: selection nodes in \u03c0i (S0 = \u2205\nrelative to \u03c0\u2217); [The following set and distributions are globally de\ufb01ned: Zi, P \u2217, P (i)\nOUTPUT: P \u2217\n\nif W (cid:54)= \u2205, return TRmz(y, x \u222a w,P,I,S,W, D).\nV\\{Y,X}\n\n3 set W = (V \\ X) \\ An(Y)DX\n4\n5 if C(D \\ X) = {C0},\n6\n7\n8\n\n1 if x = \u2205, returnP\n2 if V \\ An(Y)D (cid:54)= \u2205, return TRmz(y, x \u2229 An(Y)D,P\nif C(D \\ X) = {C0, C1, ..., Ck}, returnP\nQ\ni TRmz(ci, v \\ ci,P,I,S,W, D).\nif C0 \u2208 C(D), returnQ\nreturn TRmz(y, x \u2229 C(cid:48),Q\nif`(Si \u22a5\u22a5 Y | X)\nif |E| > 0, returnP|E|\n\nif C(D) (cid:54)= {D},\nV\\V\nif (\u2203C(cid:48))C0 \u2282 C(cid:48) \u2208 C(D),\nfor {i|Vi \u2208 C(cid:48)}, set \u03bai = \u03bai \u222a v(i\u22121)\n\ni|Vi\u2208C(cid:48) P(Vi|V (i\u22121)\n\u2227 (Zi \u2229 X (cid:54)= \u2205)\u00b4, Ei = TRmz(y, x \\ zi,P, Zi \u2229 X, i,W, D \\ {Zi \u2229 X}).\n\n11\n12\nFigure 3: Modi\ufb01ed version of identi\ufb01cation algorithm capable of recognizing mz-transportability.\n\nelse,\nif I = \u2205, for i = 0, ...,|D|,\n\n\u2229 C(cid:48), \u03bai),I,S,W, C(cid:48)).\n\nelse, FAIL(D, C0).\n\nP/P\n\ni=1 w(j)\n\nP\n\nV\\V\n\n(i\u22121)\nD\n\ni|Vi\u2208C0\n\ni Ei.\n\n9\n10\n\nP.\n\n(i)\nD\n\n\\ C(cid:48).\n\nD\n\nD\n\n.\n\nD\n\n(i)\nX\n\nour algorithm is called TRmz (see Fig. 3), and is based on the C-component decomposition for\nidenti\ufb01cation of causal effects [22, 23] (and a version of the identi\ufb01cation algorithm called ID).\nThe rationale behind TRmz is to apply Tian\u2019s factorization and decompose the target relation into\nsmaller, more manageable sub-expressions, and then try to evaluate whether each sub-expression\ncan be computed in the target domain. Whenever this evaluation fails, TRmz tries to use the exper-\niments available from the target and, if possible, from the sources; this essentially implements the\ndeclarative condition delineated in Theorem 1. Next, we consider the soundness of the algorithm.\nTheorem 3 (soundness). Whenever TRmz returns an expression for P \u2217\n\nx(y), it is correct.\n\nIn the sequel, we demonstrate how the algorithm works through the mz-transportability of Q =\nP \u2217(y|do(x)) in Fig. 2(c,d) with Z\u2217 = {Z1}, Za = {Z2}, and Zb = {Z1}.\nSince (V \\ X) \\ An(Y)DX\n= {Z2}, TRmz invokes line 3 with {Z2} \u222a {X} as interventional\nset. The new call triggers line 4 and C(D \\ {X, Z2}) = {C0, C1, C2, C3}, where C0 = DZ1,\nC1 = DZ3, C2 = DU , and C3 = DW,Y , we invoke line 4 and try to mz-transport individ-\nually Q0 = P \u2217\n(cid:80)\nx,z1,z2,z3,w,y(u), and Q3 =\nP \u2217\nx,z1,z2,z3,u(w, y). Thus the original problem reduces to try to evaluate the equivalent expression\nz1,z3,u,w P \u2217\n\nx,z2,z3,u,w,y(z1), Q1 = P \u2217\n\nx,z1,z2,u,w,y(z3), Q2 = P \u2217\n\nx,z1,z2,u,w,y(z3) P \u2217\n\nx,z2,z3,u,w,y(z1)P \u2217\n\nx,z1,z2,z3,w,y(u)P \u2217\n\nx,z1,z2,z3,u(w, y).\n\nthe\n\nevaluates\n\nexpression Q1\n\ntriggering line 2, which implies\n\nx,z2,z3,u,w,y(z1) = P \u2217(z1).\n\nFirst, TRmz evaluates the expression Q0 and triggers line 2, noting that all nodes can be ignored\nsince they are not ancestors of {Z1}, which implies after line 1 that P \u2217\nSecond, TRmz\nthat\nP \u2217\nx,z1,z2,u,w,y(z3) = P \u2217\nx,z1,z2(z3) with induced subgraph D1 = DX,Z1,Z2,Z3. TRmz goes to line 5,\nin which in the local call C(D \\ {X, Z1, Z2}) = {DZ3}. Thus it proceeds to line 6 testing whether\nC(D \\ {X, Z1, Z2}) is different from D1, which is false. In this call, ordinary identi\ufb01ability would\nfail, but TRmz proceeds to line 9. The goal of this line is to test whether some experiment can\nhelp for computing Q1. In this case, \u03c0a fails immediately the test in line 10, but \u03c0b and \u03c0\u2217 succeed,\nwhich means experiments in these domains may eventually help; the new call is P (i)\nx,z2(z3)D\\Z1, for\ni = {b,\u2217} with induced graph D(cid:48)\n1 = DX,Z2,Z3. Finally, TRmz triggers line 8 since X is not part\n1 (or, Z3 \u2208 C(cid:48) = {Z2 (cid:76)(cid:57)(cid:57)(cid:57)(cid:57)(cid:75) Z3}), so line 2 is triggered since Z2 is no\nof Z3\u2019s components in D(cid:48)\nlonger an ancestor of Z3 in D(cid:48)\n1, and then line 1 is triggered since the interventional set is empty in\nthis local call, so P \u2217\n\nx,z1,z2(z3) =(cid:80)\n\n2), for i = {b,\u2217}.\n\nz1 (z3|x, Z(cid:48)\nP (i)\n\nz1 (Z(cid:48)\n\n2)P (i)\n\nZ(cid:48)\n2\n\n7\n\n\fThird, evaluating the expression Q2, TRmz goes to line 2, which implies that P \u2217\nx,z1,z2,z3,w,y(u) =\nP \u2217\nx,z1,z2,z3,w(u) with induced subgraph D2 = DX,Z1,Z2,Z3,W,U . TRmz goes to line 5, and\nin this local call C(D \\ {X, Z1, Z2, Z3, W}) = {DU}, and the test in 6 succeed, since there\nSo,\nit triggers line 8 since W is not part of U\u2019s component\nare more components in D.\nx,z1,z2,z3,w(u) = P \u2217\nin D2. The algorithm makes P \u2217\nx,z1,z2,z3(u)D2|W (and update the work-\ning distribution); note that in this call, ordinary identi\ufb01ability would fail since the nodes are\nin the same C-component and the test in line 6 fails. But TRmz proceeds to line 9 trying\nIn this case, \u03c0b cannot help but \u03c0a\nto \ufb01nd experiments that can help in Q2\u2019s computation.\nand \u03c0\u2217 perhaps can, noting that new calls are launched for computing P (a)\nx,z1,z3(u)D2\\Z2|W rel-\nx,z2,z3(u)D2\\Z1|W relative to \u03c0\u2217 with the corresponding data structures set.\native to \u03c0a, and P \u2217\nz2 (u|w, z3, x, z1),\nIn \u03c0a, the algorithm triggers line 7, which yields P (a)\nand a bit more involved analysis for \u03c0b yields (after simpli\ufb01cation) P \u2217\nx,z2,z3(u)D2\\Z1|W =\n2 )P \u2217\nz1(Z(cid:48)(cid:48)\n\nx,z1,z3(u)D2\\Z2|W = P (a)\n\n2)(cid:1)/(cid:0)(cid:80)\n\nz1(u|w, z3, x, Z(cid:48)\nP \u2217\n\nz1(z3|x, Z(cid:48)(cid:48)\nP \u2217\n\nz1(z3|x, Z(cid:48)\n\n2 )(cid:1).\n\n(cid:0)(cid:80)\n\nz1(Z(cid:48)\n\n2)P \u2217\n\n2)P \u2217\n\nFourth, TRmz evaluates the expression Q3 and triggers line 5, C(D\\{X, Z1, Z2, Z3, U}) = DW,Y .\nIn turn, both tests at lines 6 and 7 succeed, which makes the procedure to return P \u2217\nx,z1,z2,z3,u(w, y) =\nP \u2217(w|z3, x, z1, z2)P \u2217(y|w, x, z1, z2, z3, u).\n!\nThe composition of the return of these calls generates the following expression:\n\u00ab\n\nz1 (z3|x, Z\nP (b)\n\nz1 (z3|x, Z\n\u2217\n\n(cid:48)\n2) + w(1)\n\n\u2217\nx (y) =\n\nX\n\nX\n\n(cid:48)\n2)P (b)\n\n\u2217\nz1 (Z\n\n \n\n \n\n\u00ab\n\nz1 (Z\n\n(cid:48)\n2)P\n\nz1,z3,w,u\n\nw(1)\n\n1\n\n(z1)\n\n(cid:48)\n2)\n\nZ(cid:48)\n2\n\nZ(cid:48)(cid:48)\n2\n\nZ(cid:48)\n2\n\nP\n\nP\n\n\u2217\n\n2\n\nw(2)\n\n1\n\nz1 (u|w, z3, x, Z\n\u2217\n\n(cid:48)\n2)P\n\nP\n\n(cid:48)\n2)P\n\n\u2217\nz1 (Z\n\n(cid:48)\n2)\n\n/\n\nz1 (z3|x, Z\n\u2217\nP\n\n(cid:48)(cid:48)\n2 )P\n\n\u2217\nz1 (Z\n\n(cid:48)(cid:48)\n2 )\n\nX\n\u201eX\n\nZ(cid:48)\n2\n\nZ(cid:48)(cid:48)\n2\n\nP\n\n\u201eX\n\nZ(cid:48)\n2\n\nz1 (z3|x, Z\n\u2217\n!\n\n+ w(2)\n\n2 P (a)\n\nz2 (u|w, z3, x, z1)\n\n\u2217\n\n(w|x, z1, z2, z3) P\n\n\u2217\n\n(y|x, z1, z2, z3, w, u)\n\n(15)\n\nP\n\ni\n\nj\n\n, where \u03c32\n\nj=1 \u03c3\u22122\n\ni = \u03c3\u22122\n\ni /(cid:80)nk\n\nwhere w(k)\nrepresents the weight for each factor in estimand k (i = 1, ..., nk), and nk is the number\nof feasible estimands of k. Eq. (15) depicts a powerful way to estimate P \u2217(y|do(x)) in the target\ndomain, and depending on weighting choice a different estimand will be entailed. For instance, one\nmight use an analogous to inverse-variance weighting, which sets the weights for the normalized\ninverse of their variances (i.e., w(k)\nj is the variance of the jth compo-\nnent of estimand k). Our strategy resembles the approach taken in meta-analysis [4], albeit the latter\nusually disregards the intricacies of the relationships between variables, so producing a statistically\nless powerful estimand. Our method leverages this non-trivial and highly structured relationships, as\nexempli\ufb01ed in Eq. (15), which yields an estimand with less variance and statistically more powerful.\n5 Conclusions\nIn this paper, we treat a special type of transportability in which experiments can be conducted only\nover limited sets of variables in the sources and target domains, and the goal is to infer whether a\ncertain effect can be estimated in the target using the information scattered throughout the domains.\nWe provide a general suf\ufb01cient graphical conditions for transportability based on the causal calculus\nalong with a necessary condition for a speci\ufb01c scenario, which should be generalized for arbitrary\nstructures. We further provide a procedure for computing transportability, that is, generate a formula\nfor fusing the available observational and experimental data to synthesize an estimate of the desired\ncausal effects. Our algorithm also allows for generic weighting schemes, which generalizes standard\nstatistical procedures and leads to the construction of statistically more powerful estimands.\nAcknowledgment\n\nThe work of Judea Pearl and Elias Bareinboim was supported in part by grants from NSF (IIS-\n1249822, IIS-1302448), and ONR (N00014-13-1-0153, N00014-10-1-0933). The work of Sanghack\nLee and Vasant Honavar was partially completed while they were with the Department of Computer\nScience at Iowa State University. The work of Vasant Honavar while working at the National Science\nFoundation (NSF) was supported by the NSF. The work of Sanghack Lee was supported in part by\nthe grant from NSF (IIS-0711356). Any opinions, \ufb01ndings, and conclusions contained in this article\nare those of the authors and do not necessarily re\ufb02ect the views of the sponsors.\n\n8\n\n\fReferences\n[1] J. Pearl and E. Bareinboim. Transportability of causal and statistical relations: A formal approach. In\nW. Burgard and D. Roth, editors, Proceedings of the Twenty-Fifth National Conference on Arti\ufb01cial In-\ntelligence, pages 247\u2013254. AAAI Press, Menlo Park, CA, 2011.\n\n[2] D. Campbell and J. Stanley. Experimental and Quasi-Experimental Designs for Research. Wadsworth\n\nPublishing, Chicago, 1963.\n\n[3] C. Manski.\n\nIdenti\ufb01cation for Prediction and Decision. Harvard University Press, Cambridge, Mas-\n\nsachusetts, 2007.\n\n[4] L. V. Hedges and I. Olkin. Statistical Methods for Meta-Analysis. Academic Press, January 1985.\n[5] W.R. Shadish, T.D. Cook, and D.T. Campbell. Experimental and Quasi-Experimental Designs for Gen-\n\neralized Causal Inference. Houghton-Mif\ufb02in, Boston, second edition, 2002.\n\n[6] S. Morgan and C. Winship. Counterfactuals and Causal Inference: Methods and Principles for Social\nResearch (Analytical Methods for Social Research). Cambridge University Press, New York, NY, 2007.\n\n[7] J. Pearl. Causal diagrams for empirical research. Biometrika, 82(4):669\u2013710, 1995.\n[8] P. Spirtes, C.N. Glymour, and R. Scheines. Causation, Prediction, and Search. MIT Press, Cambridge,\n\nMA, 2nd edition, 2000.\n\n[9] J. Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, New York, 2000.\n\n2nd edition, 2009.\n\n[10] D. Koller and N. Friedman. Probabilistic Graphical Models: Principles and Techniques. MIT Press,\n\n2009.\n\n[11] E. Bareinboim and J. Pearl. Transportability of causal effects: Completeness results. In J. Hoffmann and\nB. Selman, editors, Proceedings of the Twenty-Sixth National Conference on Arti\ufb01cial Intelligence, pages\n698\u2013704. AAAI Press, Menlo Park, CA, 2012.\n\n[12] E. Bareinboim and J. Pearl. Causal transportability with limited experiments.\n\nIn M. desJardins and\nM. Littman, editors, Proceedings of the Twenty-Seventh National Conference on Arti\ufb01cial Intelligence,\npages 95\u2013101, Menlo Park, CA, 2013. AAAI Press.\n\n[13] S. Lee and V. Honavar. Causal transportability of experiments on controllable subsets of variables: z-\ntransportability. In A. Nicholson and P. Smyth, editors, Proceedings of the Twenty-Ninth Conference on\nUncertainty in Arti\ufb01cial Intelligence (UAI), pages 361\u2013370. AUAI Press, 2013.\n\n[14] E. Bareinboim and J. Pearl. Meta-transportability of causal effects: A formal approach. In C. Carvalho\nand P. Ravikumar, editors, Proceedings of the Sixteenth International Conference on Arti\ufb01cial Intelligence\nand Statistics (AISTATS), pages 135\u2013143. JMLR W&CP 31, 2013.\n\n[15] S. Lee and V. Honavar. m-transportability: Transportability of a causal effect from multiple environments.\nIn M. desJardins and M. Littman, editors, Proceedings of the Twenty-Seventh National Conference on\nArti\ufb01cial Intelligence, pages 583\u2013590, Menlo Park, CA, 2013. AAAI Press.\n\n[16] H. Daume III and D. Marcu. Domain adaptation for statistical classi\ufb01ers. Journal of Arti\ufb01cial Intelligence\n\nResearch, 26:101\u2013126, 2006.\n\n[17] A.J. Storkey. When training and test sets are different: characterising learning transfer. In J. Candela,\nM. Sugiyama, A. Schwaighofer, and N.D. Lawrence, editors, Dataset Shift in Machine Learning, pages\n3\u201328. MIT Press, Cambridge, MA, 2009.\n\n[18] B. Sch\u00a8olkopf, D. Janzing, J. Peters, E. Sgouritsa, K. Zhang, and J. Mooij. On causal and anticausal\nIn J Langford and J Pineau, editors, Proceedings of the 29th International Conference on\n\nlearning.\nMachine Learning (ICML), pages 1255\u20131262, New York, NY, USA, 2012. Omnipress.\n\n[19] K. Zhang, B. Sch\u00a8olkopf, K. Muandet, and Z. Wang. Domain adaptation under target and conditional\nshift. In Proceedings of the 30th International Conference on Machine Learning (ICML). JMLR: W&CP\nvolume 28, 2013.\n\n[20] E. Bareinboim and J. Pearl. Causal inference by surrogate experiments: z-identi\ufb01ability. In N. Freitas and\nK. Murphy, editors, Proceedings of the Twenty-Eighth Conference on Uncertainty in Arti\ufb01cial Intelligence\n(UAI), pages 113\u2013120. AUAI Press, 2012.\n\n[21] M. Kuroki and M. Miyakawa. Identi\ufb01ability criteria for causal effects of joint interventions. Journal of\n\nthe Royal Statistical Society, 29:105\u2013117, 1999.\n\n[22] J. Tian and J. Pearl. A general identi\ufb01cation condition for causal effects. In Proceedings of the Eighteenth\nNational Conference on Arti\ufb01cial Intelligence, pages 567\u2013573. AAAI Press/The MIT Press, Menlo Park,\nCA, 2002.\n\n[23] I. Shpitser and J. Pearl. Identi\ufb01cation of joint interventional distributions in recursive semi-Markovian\ncausal models. In Proceedings of the Twenty-First National Conference on Arti\ufb01cial Intelligence, pages\n1219\u20131226. AAAI Press, Menlo Park, CA, 2006.\n\n9\n\n\f", "award": [], "sourceid": 134, "authors": [{"given_name": "Elias", "family_name": "Bareinboim", "institution": "UCLA"}, {"given_name": "Sanghack", "family_name": "Lee", "institution": "Penn State University"}, {"given_name": "Vasant", "family_name": "Honavar", "institution": "Penn State University"}, {"given_name": "Judea", "family_name": "Pearl", "institution": "UCLA"}]}