{"title": "Linear decision rule as aspiration for simple decision heuristics", "book": "Advances in Neural Information Processing Systems", "page_first": 2904, "page_last": 2912, "abstract": "Many attempts to understand the success of simple decision heuristics have examined heuristics as an approximation to a linear decision rule. This research has identified three environmental structures that aid heuristics: dominance, cumulative dominance, and noncompensatoriness. Here, we further develop these ideas and examine their empirical relevance in 51 natural environments. We find that all three structures are prevalent, making it possible for some simple rules to reach the accuracy levels of the linear decision rule using less information.", "full_text": "Linear Decision Rule as Aspiration\n\nfor Simple Decision Heuristics\n\n\u00a8Ozg\u00a8ur S\u00b8ims\u00b8ek\n\nCenter for Adaptive Behavior and Cognition\nMax Planck Institute for Human Development\n\nLentzeallee 94, 14195 Berlin, Germany\n\nozgur@mpib-berlin.mpg.de\n\nAbstract\n\nSeveral attempts to understand the success of simple decision heuristics have ex-\namined heuristics as an approximation to a linear decision rule. This research\nhas identi\ufb01ed three environmental structures that aid heuristics: dominance, cu-\nmulative dominance, and noncompensatoriness. This paper develops these ideas\nfurther and examines their empirical relevance in 51 natural environments. The\nresults show that all three structures are prevalent, making it possible for simple\nrules to reach, and occasionally exceed, the accuracy of the linear decision rule,\nusing less information and less computation.\n\n1\n\nIntroduction\n\nThe comparison problem asks which of a number of objects has a higher value on an unobserved\ncriterion. Typically, some attributes of the objects are available as input to the decision. An example\nis which of two houses that are currently for sale will have a higher return on investment ten years\nfrom now, given the location, age, lot size, and total living space of each house.\nThe importance of comparison for intelligent behavior cannot be overstated. Much of human and\nanimal behavior consists of choosing one object\u2014from among a number of available alternatives\u2014\nto act on, with respect to some criterion whose value is unobserved at the time. Examples include\na venture capitalist choosing a company to invest in, a scientist choosing a conference to submit a\npaper to, a female tree frog deciding who to mate with, and an ant colony choosing a nest area to\nlive in.\nThis paper focuses on paired comparison, in which there are exactly two objects to choose from, and\nits solution using linear estimation. Speci\ufb01cally, it is concerned with the environmental structures\nthat make it possible to mimic the decisions of the linear estimator using less information and less\ncomputation, asking two questions: How much of the linear estimator do we need to know to mimic\nits decisions, and under what conditions? How prevalent are these conditions in natural environ-\nments? In the following sections, I review several ideas from the literature, develop them further,\nand investigate their empirical relevance.\n\n2 Background\n\nA standard approach to the comparison problem is to estimate the criterion as a function of the\nattributes of the object, typically as a linear function:\n\n(1)\nwhere \u02c6y is the estimate of the criterion, w0 is the intercept, w1..wk are the weights, and x1..xk are\nthe attribute values. This estimate leads to a decision between objects A and B as follows, where\n\n\u02c6y = w0 + w1x1 + w2x2 + ... + wkxk,\n\n1\n\n\f\u2206xi is used to denote the difference in attribute values between the two objects:\n\n\u02c6yA \u2212 \u02c6yB = w1(x1A \u2212 x1B) + w2(x2A \u2212 x2B) + ... + wk(xkA \u2212 xkB)\n\n= w1\u2206x1 + w2\u2206x2 + ... + wk\u2206xk\n\n(cid:40) Choose object A if w1\u2206x1 + ... + wk\u2206xk > 0\n\nChoose object B\nChoose randomly\n\nif w1\u2206x1 + ... + wk\u2206xk < 0\nif w1\u2206x1 + ... + wk\u2206xk = 0\n\nDecision rule\n\n:\n\n(2)\n\n(3)\n\nThis decision rule does not need the linear estimator in its entirety. The intercept is not used at all.\nAs for the weights, it suf\ufb01ces to know their sign and relative magnitude. For instance, with two\nattributes weighted +0.2 and +0.1, it suf\ufb01ces to know that both weights are positive and that the\n\ufb01rst one is twice as high as the other.\nThe literature on simple decision heuristics [1, 2] has identi\ufb01ed several environmental structures\nthat allow simple rules to make decisions identical to those of the linear decision rule using less\ninformation [3]. These are dominance [4], cumulative dominance [5, 6], and noncompensatori-\nness [7, 8, 9, 10, 11]. I discuss each in turn in the following sections. I refer to attributes also as\ncues and to the signs of the weights as cue directions, as in the heuristics literature. An attribute that\ndiscriminates between two objects is one whose value differs on the two objects. A heuristic that\ncorresponds to a particular linear decision rule is one whose cue directions, and cue order if it needs\nthem, are identical to those of the linear decision rule.\nThe discussion will focus on two successful families of heuristics. The \ufb01rst is unit weighting [12,\n13, 14, 15, 16, 17], which uses a linear decision rule with weights of +1 or \u22121. The second is the\nfamily of lexicographic heuristics [18, 19], which examine cues one at a time, in a speci\ufb01ed order,\nuntil a cue is found that discriminates between the objects. The discriminating cue, and that cue\nonly, is used to make the decision. Lexicographic heuristics are an abstraction of the way words are\nordered in a dictionary, with respect to the alphabetical order of the letters from left to right.\n\n2.1 Dominance\n\nIf all terms wi\u2206xi in Decision Rule 3 are nonnegative, and at least one of them is positive, then\nobject A dominates object B. If all terms wi\u2206xi are zero, then objects A and B are dominance\nequivalent. It is easy to see that the linear decision rule chooses the dominant object if there is one.\nIf objects are dominance equivalent, the decision rule chooses randomly.\nDominance is a very strong relationship. When it is present, most decision heuristics choose identi-\ncally to the linear decision rule if their cue directions match those of the linear rule. These include\nunit weighting and lexicographic heuristics, with any ordering of the cues.\nTo check for dominance, it suf\ufb01ces to know the signs of the weights; the magnitudes of the weights\nI occasionally refer to dominance as simple dominance to differentiate it from\nare not needed.\ncumulative dominance, which I discuss next.\n\n2.2 Cumulative dominance\n\nThe linear sum in Equation 2 may be written alternatively as follows:\n\u02c6yA \u2212 \u02c6yB = (w1 \u2212 w2)\u2206x1 + (w2 \u2212 w3)(\u2206x1 + \u2206x2)\n\n3\u2206x(cid:48)\n\ni =(cid:80)i\n\n+(w3 \u2212 w4)(\u2206x1 + \u2206x2 + \u2206x3) + ... + wk(\u2206x1 + .. + \u2206xk)\n1\u2206x(cid:48)\n\n2 + w(cid:48)\ni = wi \u2212 wi+1, i = 1, 2, .., k \u2212 1, and (3) w(cid:48)\n\n= w(cid:48)\n2\u2206x(cid:48)\nj=1 \u2206xj, \u2200i , (2) w(cid:48)\n\n3 + ... + w(cid:48)\n\nk\u2206x(cid:48)\nk,\n\n1 + w(cid:48)\n\nwhere (1) \u2206x(cid:48)\nTo this alternative linear sum in Equation 4, we can apply the earlier dominance result, obtaining\na new dominance relationship called cumulative dominance. Cumulative dominance uses an addi-\ntional piece of information on the weights: their relative ordering.\nObject A cumulatively dominates object B if all terms w(cid:48)\ni are nonnegative and at least one of\nthem is positive. Objects A and B are cumulative-dominance equivalent if all terms w(cid:48)\ni are\nzero. The linear decision rule chooses the cumulative-dominant object if there is one. If objects are\n\nk = wk.\n\ni\u2206x(cid:48)\n\ni\u2206x(cid:48)\n\n(4)\n\n2\n\n\fi > 0, \u2200i).\n\ncumulative-dominance equivalent, the linear decision rule chooses randomly. Note that if weights\nw1..wk are positive and decreasing, it suf\ufb01ces to examine \u2206x(cid:48)\ni to check for cumulative dominance\n(because w(cid:48)\nAs an example, consider comparing the value of two piles of US coins. The attributes would be the\nnumber of each type of coin in the pile, and the weights would be the \ufb01nancial value of each type\nof coin. A pile that contains 6 one-dollar coins, 4 \ufb01fty-cent coins, and 2 ten-cent coins cumulatively\ndominates (but not simply dominates) a pile containing 3 one-dollar coins, 5 \ufb01fty-cent coins, and 1\nten-cent coin: 6 > 3, 6 + 4 > 3 + 5, 6 + 4 + 2 > 3 + 5 + 1.\nSimple dominance implies cumulative dominance. Cumulative dominance is therefore more likely\nto hold than simple dominance. When a cumulative-dominance relationship holds, the linear deci-\nsion rule, the corresponding lexicographic decision rule, and the corresponding unit-weighting rule\ndecide identically, with one exception: unit weighting may \ufb01nd a tie where the linear decision rule\ndoes not [5].\n\n2.3 Noncompensatoriness\n\nWithout loss of generality, assume that the weights w1, w2, .., wk are nonnegative, which can be\nsatis\ufb01ed by inverting the attributes when necessary. Consider the linear decision rule as a sequential\nprocess, where the terms wi\u2206xi are added one by one, in order of nonincreasing weights. If we were\nto stop after the \ufb01rst discriminating attribute, would our decision be identical to the one we would\nmake by processing all attributes? Or would the subsequent attributes reverse this early decision?\nThe answer is no, it is not possible for subsequent attributes to reverse the early decision, if\nthe attributes are binary, taking values of 0 or 1, and the weights satisfy the set of constraints\nj=i+1 wj, i = 1, 2, .., k \u2212 1. Such weights are called noncompensatory. An example is\n\nwi > (cid:80)k\n\nthe sequence 1, 0.5, 0.25, 0.125.\nWith binary attributes and noncompensatory weights, the linear decision rule and the corresponding\nlexicographic decision rule decide identically [7, 8].\nThis concludes the review of the background material. The contributions of the present paper start\nin the next section.\n\n3 A probabilistic approach to dominance\n\nTo choose between two objects, the linear decision rule examines whether(cid:80)k\n\ni=1 wi\u2206xi is above,\nbelow, or equal to zero. This comparison can be made with certainty, without knowing the exact\nvalues of the weights, if a dominance relationships exists. Here I explore what can be done in the\nabsence of such certainty. For instance, can we identify conditions under which the comparison can\nbe made with very high probability? As a motivating example, consider the case where 9 out of 10\nattributes favor object A against object B. Although we cannot be certain that the linear decision rule\nwill select object A, that would be a very good bet.\nI make the simplifying assumption that |wi\u2206xi| are independent, identically distributed samples\nfrom the uniform distribution in the interval from 0 to 1. The choice of upper bound of the interval\nis not consequential because the terms wi\u2206xi can be rescaled. Let p and n be the number of\npositive and negative terms wi\u2206xi, respectively. Using the normal approximation to the sum of\ni=1 wi\u2206xi with the normal distribution with mean p\u2212n\n. This yields the following estimate of the probability PA that the linear decision\n\nuniform variables, we can approximate(cid:80)k\n\nand variance p2+n2\nrule will select object A: PA \u2248 P (X > 0), where X \u223c N ( p\u2212n\nDe\ufb01nition: Object A approximately dominates object B if P (X > 0) \u2265 c, where c is a parameter\nof the approximation (taking values close to 1) and X \u223c N ( p\u2212n\nA similar analysis applies to cumulative dominance.\n\n2 , p2+n2\n\n).\n\n12\n\n2\n\n2 , p2+n2\n\n12\n\n).\n\n12\n\n3\n\n\f4 An empirical analysis of relevance\n\nI now turn to the question of whether dominance and noncompensatoriness exist in our environment\nin any substantial amount. There are two earlier results on the subject. When binary versions of 20\nnatural datasets were used to train a multiple linear regression model, at least 3 of the 20 models\nwere found to have noncompensatory weights [8].1 In the same 20 datasets, with a restriction of 5 on\nthe maximum number of attributes, the proportion of object pairs that exhibited simple dominance\nranged from 13% to 75% [4].\nThe present study used 51 natural datasets obtained from a wide variety of sources, including online\ndata repositories, textbooks, research publications, packages for R statistical software, and individual\nscientists collecting \ufb01eld data. The subjects were diverse, including biology, business, computer\nscience, ecology, economics, education, engineering, environmental science, medicine, political\nscience, psychology, sociology, sports, and transportation. The datasets varied in size, ranging from\n12 to 601 objects, corresponding to 66\u2013180,300 distinct paired comparisons. Number of attributes\nranged from 3 to 21. The datasets are described in detail in the supplementary material.2\nI present two sets of results: on the original datasets and on binary versions where numeric attributes\nwere dichotomized by splitting around the median (assigning the median value to the category with\nfewer objects). I refer to the original datasets as numeric datasets but it should be noted that one\ndataset had only binary attributes and many datasets had at least one binary attribute. Categorical\nattributes were recoded into binary attributes, one for each category, indicating membership in the\ncategory. Objects with missing criterion values were excluded from the analysis. Missing attribute\nvalues were replaced by means across all objects. A decision was considered to be accurate if it\nselected an object whose criterion value was equal to the maximum of the criterion values of the\nobjects being compared.\nCumulative dominance and noncompensatoriness are sensitive to the units of measurement of the\nattributes. In this analysis, all attribute values were normalized to lie between 0 and 1, measuring\nthem relative to the smallest and largest values they take in the dataset.\nThe linear decision rule was obtained using multiple linear regression with elastic net regulariza-\ntion [21], which contains both a ridge penalty and a lasso penalty. For the regularization parameter\n\u03b1, which determines the relative proportion of ridge and lasso penalties, the values of 0, 0.2, 0.4,\n0.6, 0.8, and 1 were considered. For the regularization parameter \u03bb, which controls the amount of\ntotal penalty, the default search path in the R-language package glmnet [22] was used. Both \u03b1 and\n\u03bb were selected using cross validation. Speci\ufb01cally, \u03b1 and \u03bb were set to the values that gave the\nminimum mean cross-validation error in the training set. I refer to the linear decision rule learned in\nthis manner as the base decision rule.\nOn datasets with fewer than 1000 pairs of objects, a separate linear decision rule was learned for\nevery pair of objects, using all other objects as the training set. On larger datasets, the pairs of\nobjects were randomly placed in 1000 folds and a separate model was learned for each fold, training\nwith all objects not contained in that fold. Five replications were done using different random seeds.\nPerformance of the base decision rule The accuracy of the base decision rule differed substantially\nacross datasets, ranging from barely above chance to near-perfect. In numeric datasets, accuracy\nranged from 0.56 to 0.98 (mean=0.79). In binary datasets, accuracy was generally lower, ranging\nfrom 0.55 to 0.86 (mean=0.74). Compared to standard multiple linear regression, regularization\nimproved accuracy in most datasets, occasionally in large amounts (as much as by 19%). Without\nregularization, mean accuracy across datasets was lower by 1.17% in numeric datasets and by 0.51%\nin binary datasets.\nDominance Figure 1 shows prevalence of dominance, measured by the proportion of object pairs\nin which one object dominates the other or the two objects are equivalent. The \ufb01gure shows four\ntypes of dominance in each of the datasets. Simple and cumulative dominance are displayed as\n\n1The authors found 3 datasets in which the weights were noncompensatory and the order of the weights\nwas identical to the cue order of the take-the-best heuristic [19]. It is possible that additional datasets had\nnoncompensatory weights but did not match the take-the-best cue order.\n\n2The datasets included the 20 datasets in Czerlinski, Gigerenzer & Goldstein [20], which were used to\n\nobtain the two sets of earlier results discussed above [8, 4].\n\n4\n\n\fFigure 1: Prevalence of dominance. Blue lines show simple dominance, red lines show cumula-\ntive dominance, blue-\ufb01lled circles show approximate simple dominance, and red-\ufb01lled circles show\napproximate cumulative dominance.\n\nblue and red lines stacked on top of each other. Recall that simple dominance implies cumulative\ndominance, so the blue lines show pairs with both simple- and cumulative-dominance relationships.\nApproximate simple and cumulative dominance are displayed as blue- and red-\ufb01lled circles, re-\nspectively. The datasets are presented in order of decreasing prevalence of simple dominance. The\nmean, median, minimum, and maximum prevalence of each type of dominance across the datasets\n\nNUMERIC DATASETS\n\nMean Median Min Max\n\nBINARY DATASETS\n\nMean Median Min Max\n\nPREVALENCE\nDom\nDom approx c=0.99\nCum dom\nCum dom approx c=0.99\nNoncompensatory weights\nNoncompensation\nACCURACY (%)\nDom\nDom approx c=0.99\nCum dom\nCum dom approx c=0.99\nLexicographic\n\n0.25\n0.35\n0.58\n0.74\n\n0.83\n\n76.8\n81.2\n90.6\n94.2\n93.5\n\n0.16\n0.31\n0.62\n0.77\n\n0.85\n\n77.2\n82.9\n93.4\n96.1\n96.1\n\n0.00\n0.03\n0.11\n0.30\n\n0.49\n\n56.2\n57.0\n60.8\n69.5\n51.4\n\n0.91\n0.91\n0.94\n0.94\n\n0.99\n\n97.5\n100.5\n101.4\n101.4\n110.6\n\n0.51\n0.58\n0.87\n0.92\n0.17\n0.93\n\n87.1\n90.5\n98.3\n99.2\n97.6\n\n0.54\n0.59\n0.89\n0.92\n0.00\n0.96\n\n89.0\n91.2\n98.9\n99.6\n99.6\n\n0.07\n0.22\n0.61\n0.76\n0.00\n0.77\n\n63.9\n70.6\n90.6\n93.8\n78.9\n\n1.00\n1.00\n1.00\n1.00\n1.00\n1.00\n\n100.0\n100.0\n103.7\n103.7\n104.4\n\nTable 1: Descriptive statistics on dominance, cumulative dominance, and noncompensatoriness.\nAccuracy is shown as a percentage of the accuracy of the base decision rule.\n\n5\n\n0.00.20.40.60.81.0Proportion of object pairsNumeric datasets\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll0.00.20.40.60.81.0Proportion of object pairsBinary datasets\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll\fFigure 2: Accuracy of decisions guided by dominance. Blue lines show simple dominance, red\nlines show cumulative dominance, blue-\ufb01lled circles show approximate simple dominance, and red-\n\ufb01lled circles show approximate cumulative dominance. Green circles show the accuracy of the base\ndecision rule for comparison.\n\nare shown in Table 1, along with other performance measures that will be discussed shortly. The\napproximation made a difference in 27\u201333 of 51 datasets, depending on type of dominance and data\n(numeric/binary). As expected, the datasets on which the approximation made a difference were\nthose that had a larger number of attributes. Speci\ufb01cally, they all had six or more attributes.\nFigure 2 shows the accuracy of decisions guided by dominance: choose the dominant object when\nthere is one; choose randomly otherwise. This accuracy can be higher than the accuracy of the\nbase decision rule, which happens if choosing randomly is more accurate than the base decision\nrule on pairs that exhibit no dominance relationship. Table 1 shows the mean, median, minimum,\nand maximum accuracies across the datasets measured as a percentage of the accuracy of the base\ndecision rule. The accuracies were surprisingly high, more so with binary data. It is worth pointing\nout that the accuracy of approximate cumulative dominance in binary datasets ranged from 93.8%\nto 103.7% of the accuracy of the base decision rule.\nIn the results discussed so far, approximate dominance was computed by setting c = 0.99. This\nvalue was selected prior to the analysis based on what this parameter means: 1 \u2212 c is the expected\nerror rate of the approximation, where error rate is the proportion of approximately dominant objects\nthat are not selected by the linear decision rule.\nFigure 3, left panel, shows how well the approximation fared in the 51 datasets with various choices\nof the parameter c. The vertical axis shows the mean error rate of the approximation. With numeric\ndata, the error rates were reasonably close to the expected values. With binary data, error rates were\nsubstantially lower than expected.\n\n6\n\n0.50.60.70.80.91.0AccuracyNumeric datasetslllllllllllllllllllllllllllllllllllllllllllllllllll\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll0.50.60.70.80.91.0AccuracyBinary datasetslllllllllllllllllllllllllllllllllllllllllllllllllll\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212\u2212llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll\fFigure 3: Left: Error rates of approximate dominance with various values of the approximation\nparameter c. Right: Proportion of linear models with noncompensatory weights in each of the\ndatasets.\n\nNoncompensatoriness Let noncompensation be a logical variable that equals TRUE if the deci-\nsion of the \ufb01rst discriminating cue, when cues are processed in nonincreasing magnitude of the\nweights, is identical to the decision of the linear decision rule. With binary cues and noncompen-\nsatory weights, noncompensation is TRUE with probability 1. Otherwise, its value depends on cue\nvalues. If noncompensation is TRUE, the linear decision rule and the corresponding lexicographic\nrule make identical decisions.\nFigure 3, right panel, shows the proportion of base decision rules with noncompensatory weights\nin binary datasets. Recall that a large number of base decision rules were learned on each dataset,\nusing different training sets and random seeds. The proportion of base decision rules with noncom-\npensatory weights ranged from 0 to 1, with a mean of 0.17 across datasets. Nine datasets had values\ngreater than 0.50. Thirty-two datasets had values less than 0.01.\nFigure 4 shows noncompensation in each dataset, together with the accuracies of the base decision\nrule and the corresponding lexicographic rule. The accuracies on the same dataset are connected\nby a line segment. The \ufb01gure reveals overwhelmingly large levels of noncompensation, particularly\nfor binary data. Median noncompensation was 0.85 in numeric datasets and 0.96 in binary datasets.\nConsequently, the accuracy of the lexicographic rule was very close to that of the linear decision\nrule: its median accuracy relative to the base decision rule was 96% in numeric datasets and 100%\nin binary datasets. In summary, although noncompensatory weights were not particularly prevalent\nin the datasets, actual levels of noncompensation were very high.\n\n5 Discussion\n\nIt is fair to conclude that all three environmental structures are prevalent in natural environments to\nsuch a high degree that decisions guided by these structures approach, and occasionally exceed, the\nbase decision model in predictive accuracy.\nWe have not examined the performance of any particular decision heuristic, which depends on the\ncue directions and cue order it uses. These will not necessarily match those of the linear decision\nrule.3 The results here show that it is possible for decision heuristics to succeed in natural environ-\nments by imitating the decisions of the linear model using less information and less computation\u2014\nbecause the conditions that make it possible are prevelant\u2014but not that they necessarily do so.\n\n3When this is the case, it should be noted, the decision heuristic may have a higher predictive accuracy than\n\nthe linear model.\n\n7\n\n0.000.010.020.030.040.050.000.010.020.030.040.05Observed error rateExpected error rate (1-c)llllllllllllllllllDom binaryCum dom binaryDom numericCum dom numericNoncompensatory weights0.00.20.40.60.81.0Binary datasets\fFigure 4: Prevalence of noncompensation. For each dataset, the proportion of decisions in which\nnoncompensation took place are plotted against the accuracy of the base decision rule (displayed\nin green circles) and the accuracy of the corresponding lexicographic rule (displayed in blue plus\nsigns). Accuracies on the same dataset are connected by a line segment.\n\nWhen decision heuristics are examined through the lens of bias-variance decomposition [23, 24, 25],\nthe three environmental structures examined here are particularly relevant for the bias component\nof the prediction error. The results presented here suggest that while simple decision heuristics\nexamine a tiny fraction of the set of linear models, in natural environments, they may do so without\nintroducing much additional bias.\nIt is sometimes argued that the environmental structures discussed here, noncompensatoriness in\nparticular, are relevant for model \ufb01tting but not for prediction on unseen data. This is not accurate.\nThe results reviewed in Sections 2.1\u20132.3 apply to a linear model regardless of how the linear model\nwas trained. If we are comparing objects that were not used to train the model, as we have done\nhere, the discussion pertains to predictive accuracy.\nThe probabilistic approximations of dominance and of cumulative dominance introduced in this\npaper can be used as decision heuristics themselves, combined with any method of estimating cue\ndirections and cue order. I leave detailed examination of their performance for future work but note\nthat the results here are encouraging.\nFinally, I hope that these results will stimulate further research in statistical properties of decision\nenvironments, as well as cognitive models that exploit them, for further insights into higher cogni-\ntion.\n\nAcknowledgments\n\nI am grateful to all those who made their datasets available for this study. Thanks to Gerd Gigeren-\nzer, Konstantinos Katsikopoulos, Amit Kothiyal, and three anonymous reviewers for comments on\nearlier versions of this manuscript, and to Marcus Buckmann for his help in gathering the datasets.\nThis work was supported by Grant SI 1732/1-1 to \u00a8Ozg\u00a8ur S\u00b8ims\u00b8ek from the Deutsche Forschungsge-\nmeinschaft (DFG) as part of the priority program \u201cNew Frameworks of Rationality\u201d (SPP 1516).\n\n8\n\n0.50.60.70.80.91.0AccuracyNUMERIClllllllllllllllllllllllllllllllllllllllllllllllllll0.50.60.70.80.91.00.50.60.70.80.9AccuracyNoncompensationBINARYlllllllllllllllllllllllllllllllllllllllllllllllllll\fReferences\n[1] G. Gigerenzer, P. M. Todd, and the ABC Research Group. Simple heuristics that make us smart. Oxford\n\nUniversity Press, New York, 1999.\n\n[2] G. Gigerenzer, R. Hertwig, and T. Pachur, editors. Heuristics: The Foundations of Adaptive Behavior.\n\nOxford University Press, New York, 2011.\n\n[3] K. V. Katsikopoulos. Psychological heuristics for making inferences: De\ufb01nition, performance, and the\n\nemerging theory and practice. Decision Analysis, 8(1):10\u201329, 2011.\n\n[4] R. M. Hogarth and N. Karelaia. \u201cTake-the-best\u201d and other simple strategies: Why and when they work\n\n\u201cwell\u201d with binary cues. Theory and Decision, 61(3):205\u2013249, 2006.\n\n[5] M. Baucells, J. A. Carrasco, and R. M. Hogarth. Cumulative dominance and heuristic performance in\n\nbinary multiattribute choice. Operations Research, 56(5):1289\u20131304, 2008.\n\n[6] J. A. Carrasco and M. Baucells. Tight upper bounds for the expected loss of lexicographic heuristics in\n\nbinary multi-attribute choice. Mathematical Social Sciences, 55(2):156\u2013189, 2008.\n\n[7] L. Martignon and U. Hoffrage. Why does one-reason decision making work? In G. Gigerenzer, P. M.\nTodd, and the ABC Research Group, editors, Simple heuristics that make us smart, pages 119\u2013140. Oxford\nUniversity Press, New York, 1999.\n\n[8] L. Martignon and U. Hoffrage. Fast, frugal, and \ufb01t: Simple heuristics for paired comparison. Theory and\n\nDecision, 52(1):29\u201371, 2002.\n\n[9] R. M. Hogarth and N. Karelaia. Simple models for multiattribute choice with many alternatives: When it\ndoes and does not pay to face trade-offs with binary attributes. Management Science, 51(12):1860\u20131872,\n2005.\n\n[10] K. V. Katsikopoulos and L. Martignon. Na\u00a8\u0131ve heuristics for paired comparisons: Some results on their\n\nrelative accuracy. Journal of Mathematical Psychology, 50(5):488\u2013494, 2006.\n\n[11] K. V. Katsikopoulos. Why do simple heuristics perform well in choices with binary attributes? Decision\n\nAnalysis, 10(4):327\u2013340, 2013.\n\n[12] S. S. Wilks. Weighting systems for linear functions of correlated variables when there is no dependent\n\nvariable. Psychometrika, 3(1):23\u201340, 1938.\n\n[13] F. L. Schmidt. The relative ef\ufb01ciency of regression and simple unit weighting predictor weights in applied\n\ndifferential psychology. Educational and Psychological Measurement, 31:699\u2013714, 1971.\n\n[14] R. M. Dawes and B. Corrigan. Linear models in decision making. Psychological Bulletin, 81(2):95\u2013106,\n\n1974.\n\n[15] R. M. Dawes. The robust beauty of improper linear models in decision making. American Psychologist,\n\n34(7):571\u2013582, 1979.\n\n[16] H. J. Einhorn and R. M. Hogarth. Unit weighting schemes for decision making. Organizational Behavior\n\nand Human Performance, 13(2):171\u2013192, 1975.\n\n[17] C. P. Davis-Stober. A geometric analysis of when \ufb01xed weighting schemes will outperform ordinary least\n\nsquares. Psychometrika, 76(4):650\u2013669, 2011.\n\n[18] P. C. Fishburn. Lexicographic orders, utilities and decision rules: A survey. Management Science,\n\n20(11):1442\u20131471, 1974.\n\n[19] G. Gigerenzer and D. G. Goldstein. Reasoning the fast and frugal way: Models of bounded rationality.\n\nPsychological Review, 103(4):650\u2013669, 1996.\n\n[20] J. Czerlinski, G. Gigerenzer, and D. G. Goldstein. How good are simple heuristics? In G. Gigerenzer,\nP. M. Todd, and the ABC Research Group, editors, Simple heuristics that make us smart, pages 97\u2013118.\nOxford University Press, New York, 1999.\n\n[21] H. Zou and T. Hastie. Regularization and variable selection via the elastic net. Journal of the Royal\n\nStatistical Society, Series B, 67:301\u2013320, 2005.\n\n[22] J. Friedman, T. Hastie, and R. Tibshirani. Regularization paths for generalized linear models via coordi-\n\nnate descent. Journal of Statistical Software, 33(1):1\u201322, 2010.\n\n[23] S. Geman, E. Bienenstock, and R. Doursat. Neural networks and the bias/variance dilemma. Neural\n\nComputation, 4(1):1\u201358, 1992.\n\n[24] H. Brighton and G. Gigerenzer. Bayesian brains and cognitive mechanisms: Harmony or dissonance? In\nN. Chater and M. Oaksford, editors, The probabilistic mind: Prospects for Bayesian cognitive science,\npages 189\u2013208. Oxford University Press, New York, 2008.\n\n[25] G. Gigerenzer and H. Brighton. Homo Heuristicus: Why biased minds make better inferences. Topics in\n\nCognitive Science, 1(1):107\u2013143, 2009.\n\n9\n\n\f", "award": [], "sourceid": 1337, "authors": [{"given_name": "\u00d6zg\u00fcr", "family_name": "\u015eim\u015fek", "institution": "Max Planck Institute Berlin"}]}