{"title": "Adaptive Market Making via Online Learning", "book": "Advances in Neural Information Processing Systems", "page_first": 2058, "page_last": 2066, "abstract": "We consider the design of strategies for \\emph{market making} in a market like a stock, commodity, or currency exchange. In order to obtain profit guarantees for a market maker one typically requires very particular stochastic assumptions on the sequence of price fluctuations of the asset in question. We propose a class of spread-based market making strategies whose performance can be controlled even under worst-case (adversarial) settings. We prove structural properties of these strategies which allows us to design a master algorithm which obtains low regret relative to the best such strategy in hindsight. We run a set of experiments showing favorable performance on real-world price data.", "full_text": "Adaptive Market Making via Online Learning\n\nJacob Abernethy\u21e4\n\nComputer Science and Engineering\n\nUniversity of Michigan\n\njabernet@umich.edu\n\nSatyen Kale\n\nIBM T. J. Watson Research Center\n\nsckale@us.ibm.com\n\nAbstract\n\nWe consider the design of strategies for market making in an exchange. A market\nmaker generally seeks to pro\ufb01t from the difference between the buy and sell price\nof an asset, yet the market maker also takes exposure risk in the event of large price\nmovements. Pro\ufb01t guarantees for market making strategies have typically required\ncertain stochastic assumptions on the price \ufb02uctuations of the asset in question;\nfor example, assuming a model in which the price process is mean reverting. We\npropose a class of \u201cspread-based\u201d market making strategies whose performance\ncan be controlled even under worst-case (adversarial) settings. We prove structural\nproperties of these strategies which allows us to design a master algorithm which\nobtains low regret relative to the best such strategy in hindsight. We run a set of\nexperiments showing favorable performance on recent real-world stock price data.\n\n1\n\nIntroduction\n\nWhen a trader enters a market, say a stock or commodity market, with the desire to buy or sell a\ncertain quantity of an asset, how is this trader guaranteed to \ufb01nd a counterparty to agree to transact\nat a reasonable price? This is not a problem in a liquid market, with a deep pool of traders ready to\nbuy or sell at any time, but in a thin market the lack of counterparties can be troublesome. A rushed\ntrader may even be willing to transact at a worse price in exchange for immediate execution.\nThis is where a market maker (MM) can be quite useful. A MM is any agent that participates in a\nmarket by offering to both buy and sell the underlying asset at any time. To put it simply, a MM\nconsistently guarantees liquidity to the marketplace by promising to be a counterparty to any trader.\nThe act of market making has both potential bene\ufb01ts and risks. For one, the MM bears the risk\nof transacting with better-informed traders that may know much more about the movement of the\nasset\u2019s price, and in such scenarios the MM can take on a large inventory of shares that it may have\nto of\ufb02oad at a worse price. On the positive side, the MM can pro\ufb01t as a result of the bid-ask spread,\nthe difference between the MM\u2019s buy price and sell price. In other words, if the MM buys 100 shares\nof a stock from one trader at a price of p, and then immediately sells 100 shares of stock to another\ntrader at a price of p + , the MM records a pro\ufb01t of 100.\nThis describes the central goal of a pro\ufb01table market making strategy: minimize the inventory risk\nof large movements in the price while simultaneously aiming to bene\ufb01t from the bid-ask spread.\nThe MM strategy has a state, which is the current inventory or holdings, receives as input order and\nprice data, and must decide what quantities and at what prices to offer in the market. In the present\npaper we assume that the MM interacts with a continuous double auction via an order book, and the\nMM can place both market and limit orders to the order book.\nA number of MM strategies have been proposed, and in many cases certain pro\ufb01t/loss guarantees\nhave been given. But to the best of our knowledge all such guarantees (aside from [4]) have required\n\u21e4Work performed while the author was in the CIS Department at the University of Pennsylvania and funded\n\nby a Simons Postdoctoral Fellowship\n\n1\n\n\fstochastic assumptions on the traders or the sequence of price \ufb02uctuations. Often, e.g., one needs to\nassume that the underlying price process exhibits a mean reverting behavior to guarantee pro\ufb01t.\nIn this paper we focus on constructing MM strategies that achieve non-stochastic guarantees on\npro\ufb01t and loss. We begin by proposing a class of market making strategies, parameterized by the\nchoice of bid-ask spread and liquidity, and we establish a data-dependent expression for the pro\ufb01t\nand loss of each strategy at the end of a sequence of price \ufb02uctuations. The model we consider, as\nwell as the aforementioned class of strategies, builds off of the work of Chakraborty and Kearns [4].\nIn particular, we assume the MM is given an exogenously-speci\ufb01ed price time series that is revealed\nonline. We also assume that the MM is able to make and cancel orders after every price \ufb02uctuation.\nWe extend the prior work [4] by considering the problem of online learning among this parameter-\nized class of strategies. Performance is measured in terms of regret, which is the difference between\nthe total value of the learner\u2019s algorithm and that of the best strategy in hindsight. While this prob-\nlem is related to the problem of learning from expert advice, standard algorithms assume that the\nexperts have no state; i.e. in each round, the cost of following any given expert\u2019s advice is the same\nas the cost to that expert. This is not the case for online learning of the bid-ask spread, where the\nstate, represented by the inventory of each strategy, affects the payoffs. We can prove however that\ndue to the combinatorial structure of these strategies, one can afford to switch state with bounded\ncost. Using these structural properties we prove the following main result of this paper:\n\nTheorem 1 There is an online learning algorithm, that, under a bounded price volatility assumption\n(see De\ufb01ntion 1) has O(pT ) regret after T trading periods to the best spread-based strategy.\nExperimental simulations of our online learning algorithms with real-world price data suggest that\nthis approach is quite promising; our algorithm frequently performs nearly as well as the best strat-\negy, and is often superior. Such empirical results provides some evidence that regret minimization\ntechniques are well-suited for adaptively setting the bid-ask spread.\n\nRelated Work Perhaps the most popular model to study market making has been the Glosten-\nMilgrom model [11]. In this setting the market is facilitated by a specialist, a monopolistic market\nmaker that acts as the middle man for all trades. There has been some work in the Computer Science\nliterature that has considered the sequential decision problem of the specialist [8, 10], and this work\nwas extended to look at the more modern order book mechanism [9]. In our model traders interact\ndirectly with an order book, not via a specialist, and the prices are set exogenously as in [4].\nOver the past ten years that has been a burst of research within the AI and EconCS community on\nthe design of prediction markets in which traders can bet on the likelihood of future uncertain events\n(like horse races, or elections). Much of this started with a couple of key results of Robin Hanson\n[12, 13] who described how to design subsidized prediction markets via the use of proper scoring\nrules. The key technique was a method to design an automated market maker, and there has been\nmuch work on facilitating this using mechanisms based on shares (i.e. Arrow-Debreu securities).\nThere is a medium-sized literature on this topic by now [6, 5, 1, 2] and we mention only a selection.\nThe key difference between the present paper and the work on designing prediction markets is that\nour techniques are solely focused on pro\ufb01t and risk, and not on other issues like price discovery or\ninformation aggregation. Recent work by Della Penna and Reid [19] considered market making as\na the multi-armed bandit problem, and this is a notable exception where pro\ufb01t was the focus.\nThis \u201cnon-stochastic\u201d approach we take to the market making problem echos many of the ideas\nof Cover\u2019s results on Universal Portfolio algorithms [20], an area that has received much followup\nwork [16, 15, 14, 3, 7] given its robustness to adversarially-chosen price \ufb02uctuations. But these\nalgorithms are of the \u201cmarket taking\u201d variety, that is they actively rebalance their portfolio on a\ndaily basis. Moreover, the goal of the Universal Portfolio is to get low regret with respect to the best\n\ufb01xed mixture of investments, rather than the best bid-ask spread which is aim of the present work.\n\n2 The Market Execution Framework\n\nWe now present our market model formally. We will consider the buying and selling of a single\nsecurity, say the stock of Microsoft, over the course of some time interval. We assume that all\nevents in the market take place at discrete points in time throughout this day. At each time period t a\n\n2\n\n\fmarket price pt is announced to the world. In a typical stock exchange this price will be rounded to a\ndiscrete value; historically stock prices were quoted in 1\n8\u2019s of a dollar, although now they are quoted\nin pennies. We let  be the discretization parameter of the exchange, and for simplicity assume\n = 1/m for some positive integer m. Now let \u21e7 be the set of discrete prices within some feasible\nrange, \u21e7:= {, 2, 3, . . . , ( M\n  1), M}, where M is some reasonable bound on the largest price.\nA trading strategy maintains two state variables at the beginning of every time period t: (a) the\nholdings or inventory Ht 2 R, representing the amount of stock that the strategy is long or short\nat the beginning of time period t (Ht will be negative if the strategy is short); (b) the cash Ct 2 R\nof the strategy, representing the money earned or lost by the investor at that time. Initially we have\nC1 = H1 = 0. Note that when Ct < 0 this is not necessarily bad, it simply means the investor has\nborrowed money to purchase holdings, often referred to as \u201ctrading on margin\u201d.\nLet us now consider the trading mechanism at time t. For simplicity we assume there are two types\nof trades that can be executed, and each will change the cash and holdings at the following time\nperiod. By default, set Ht+1 Ht and Ct+1 Ct. Then the trading strategy can execute any\nsubset of the following two actions:\n\n\u2022 Market Order: At time t the posted price is pt and the trader executes a trade of X shares,\nwith X 2 R. In this case we update the cash as Ct+1 Ct+1  ptX and Ht+1 \nHt+1 + X. Note that if X < 0 then this is a short sale in which case the trader\u2019s cash\nincreases1\n\n\u2022 Limit Order: Before time period t, the trader submits a demand schedule Lt :\u21e7 ! R+,\nwhere it is assumed that Lt(pt1) = 0. For every price p 2 \u21e7 with p < pt1, the value\nLt(p) is the number of shares the trader would like to buy at a price of p. For every p > pt1\nthe value Lt(p) is the number of shares the trader would like to sell at a price of p. One\nshould interpret a limit order in terms of \u201cposting shares to the order book\u201d: these shares\nare up for sale (and/or purchase) but the order will only be executed if the price moves.\nIn round t the posted price becomes pt and it is assumed that all shares offered at any price\nbetween pt1 and pt are transacted. More speci\ufb01cally, we have two cases:\n\n\u2013 If pt > pt1 then for each p 2 \u21e7 with pt1 < p \uf8ff pt we update Ct+1 Ct+1 +\n\u2013 Else if pt < pt1 then for each p 2 \u21e7 with pt \uf8ff p < pt1 we update Ct+1 \n\npLt(p) and Ht+1 Ht+1  Lt(p);\nCt+1  pLt(p) and Ht+1 Ht+1 + Lt(p).\n\nIt is worth noting market orders are quite different from limit orders. A limit order is a passive action\nin the market, the trader simply states that he would be willing to trade a number of shares at a range\nof different prices. But if the market does not move then no transactions occur. The market order is a\nmuch more direct action to take, the transaction is guaranteed to execute at the current market price.\nThe market order has the downside that the trader does not get to specify the price at which he would\nlike to trade, contrary to the limit order. Roughly speaking, an MM strategy will generally interact\nwith the market via limit orders, since the MM is simply hoping to pro\ufb01t from liquidity provision.\nBut the MM may at times have to place market orders to balance inventory to control risk.\nWe include one more piece of notation, the value of the strategy\u2019s portfolio Vt+1 at the end of time\nperiod t, which can be de\ufb01ned explicitly in terms of the cash, holdings, and current market price:\nVt+1 := Ct+1 + ptHt+1. In other words, Vt+1 is the amount of cash the strategy would have if it\nliquidated all holdings at the current market price.\n\nIn the described framework we make several simplifying assumptions\n\nAssumptions of our model.\non the trading execution mechanism, which we note here.\n(1) The trader pays neither transaction fees nor borrowing costs when his cash balance is negative.\n(2) Market orders are executed at exactly the posted market price, without \u201cslippage\u201d of any kind.\nThis suggests that the market is very liquid relative to the actions of the MM.\n(3) The market allows the buying and selling of fractional shares.\n\n1Technically speaking, a brokerage \ufb01rm won\u2019t give the short-seller the cash to spend since this money will\nbe used to backup losses when the short position is closed. But for the purpose of accounting it is perfectly\nreasonably to record cash in this way, assuming that the strategy ends up holdings at 0.\n\n3\n\n\f(4) The price sequence is \u201cexogenously\u201d determined, meaning that the trades we make do not affect\nthe current and future prices. This assumption has been made in previous results [4] and it is perhaps\nquite strong, especially if the MM is providing the bulk of the liquidity. We leave it for future work\nto consider the setting with a non-exogenous price process.\n(5) Unexecuted limited orders are cancelled before the next period. That is, for any p not lying\nbetween pt1 to pt it is assumed that the Lt(p) untransacted shares at price p are removed from the\norder book. This is just notational convenience: the MM can resubmit these shares via Lt+1.\n\n3 Spread-based Strategies\n\nIn this section we present a class of simple market making strategies which we refer to as spread-\nbased strategies since they maintain a \ufb01xed bid-ask spread throughout. We then prove some struc-\ntural properties on this class of strategies. We only give proof sketches for lack of space; all proofs\ncan be found in an appendix in the supplementary material.\n\n3.1 Spread-based strategies.\nWe consider market making strategies parameterized by a window size b 2{ , 2, . . . , B}, where\nB is a multiple of . Before round t, the strategy S(b) selects a window of size b, viz. [at, at + b],\nstarting with a1 = p1. For some \ufb01xed liquidity density parameter \u21b5, it submits a buy order of \u21b5\nshares at every price p 2 \u21e7 such that p < at and a sell order \u21b5 shares at every price p 2 \u21e7 such that\np > at + b. Depending on the price in the trading period pt, the strategy adjusts the next window by\nthe smallest amount necessary to include pt.\n\nAlgorithm 1 Spread-Based Strategy S(b)\n1: Receive parameters b > 0, liquidity density \u21b5> 0, inital price p1 as input. Initialize a1 := p1.\n2: for t = 1, 2, . . . , T do\n3:\n4:\n5:\n6:\n7:\n8: end for\n\nObserve market price pt\nIf pt < at then at+1 pt\nElse If pt > at + b then at+1 pt  b\nElse at+1 at\nSubmit limit order Lt+1: Lt+1(p) = 0 if p 2 [at+1, at+1 + b], else Lt+1(p) = \u21b5.\n\nThe intuition behind a spread-based strategy is that the MM waits for the price to deviate in such a\nway that it leaves the window [at, at + b]. Let\u2019s say the price suddenly drops below at and we get\npt = atk for some positive integer k such that k < b. As soon as this happens some transactions\noccur and the MM now has holdings of k\u21b5 shares. That is, the MM will have purchased \u21b5 shares at\neach of the prices at  , at  2, . . . , at  k. On the following round the MM updates his limit\norder Lt+1 to offer to sell \u21b5 shares at each of the price levels at + b (k 1), at + b (k 2), . . ..\nThis gives a natural matching between shares that were bought and shares that are offered for sale,\nwith the sale price being exactly b higher than the purchased price. If, at a later time t0 > t, the price\nrises so that pt0  at + b +  then all shares bought previously are sold at a pro\ufb01t of kb\u21b5.\nWe now give a very useful lemma, that essentially shows that we can calculate the pro\ufb01t and loss\nof a spread-based strategy on two factors: (a) how much the spread window moves throughout the\ntrading period, and (b) how far away the \ufb01nal price is from the initial price. A sketch of the proof is\nprovided, but the complete version is in the Appendix.\n\nLemma 1 The value of the portfolio of S(b) at time T can be bounded as\n\nVT +1 \n\nb\n\n2|at+1  at| (|aT +1  a1| + b)2!\n\n\u21b5\n\n TXt=1\n\nPROOF:[Sketch] The proof of this lemma is quite similar to the proof of Theorem 2.1 in [4]. The\nmain idea is given in the intuitive explanation above: we can match pairs of shares that are bought\n\n4\n\n\fand sold at prices that are b apart, thus registering a pro\ufb01t of b for each such pair. We can relate\nthese matched pairs to the at\u2019s, and the unmatched stock transactions to the difference |aT +1  a1|,\nyielding the stated bound. 2\nIn other words, the risk taken by all strategies is roughly the same ( 1\n2|pT +1  p1|2 up to an addi-\ntive constant in the quadratic term). But the revenue of the spread-based strategy scales with two\nquantities: the size of the window b but also the total movement of the window. This raises an in-\nteresting tradeoff in setting the b parameter, since we would like to make as much as possible on the\nmovement of the window, but by increasing b the window will get \u201cpushed around\u201d a lot less by the\n\ufb02uctuating price.\nWe now make some convenient normalization. Since for every unit price change, the strategies trade\n\u21b5/ shares, in the rest of the paper, without loss of generality, we may assume that \u21b5 = 1 and  = 1\n(by appropriately changing the unit of currency). The regret bounds for general \u21b5 and  scale up by\na factor of \u21b5\n .\n\n3.2 Structural properties of spread-based strategies.\n\nIt is useful to prove certain properties about the proposed spread-based strategies.\n\nLemma 2 Consider any two strategies S(b) and S(b0) with b0 < b. Let [a0t, a0t + b0] and [at, at + b]\ndenote the intervals chosen by S(b) and S(b0) at time t respectively. Then for all t, we have [a0t, a0t +\nb0] \u21e2 [at, at + b].\nPROOF:[Sketch] This is easy to prove by induction on t, via a simple case analysis on where pt lies\nin relation to the windows [a0t, a0t + b0] and [at, at + b]. 2\nLemma 3 For any strategy S(b), its inventory at time t, Ht, equals a1  at.\nPROOF:[Sketch] Again using case analysis on where pt lies in relation to the window [at, at + b],\nwe can show that Ht + at is an invariant. Thus, Ht + at = H1 + a1 = a1, and hence Ht = a1  at.\n2\nThe following corollary follows easily:\n\nCorollary 1 For any round t, consider any two strategies S(b) and S(b0) with b0 < b, with invento-\nries Ht and H0t respectively. Then |Ht  H0t|\uf8ff b  b0.\nPROOF: By Lemma 3 we have |HtH0t| = |a1a01+a0tat|\uf8ff bb0, since [a01, a01+b0] \u21e2 [a1, a1+b]\nand by Lemma 2 [a0t, a0t + b0] \u21e2 [at, at + b]. 2\nDe\ufb01nition 1 (-bounded volatility) A price sequence p1, p2, . . . , pT is said to have -bounded\nvolatility if for all t  2, we have |pt  pt1|\uf8ff .\nWe assume from now that the price sequence has -bounded volatility. Suppose now that we have\na set B of N window sizes b, all bounded by B. In the rest of the paper, all vectors are in RN with\ncoordinates indexed by b 2B . For every b 2B , at the end of time period t, let its inventory be\nHt+1(b), cash value be Ct+1(b), and total value be Vt+1(b). These quantities de\ufb01ne the vectors\nHt+1, Ct+1 and Vt+1. The following lemma shows that the change in the total value of different\nstrategies in any round is similar.\nLemma 4 De\ufb01ne G = 2B + 2. In round t, H = minb2B{Ht(b)}. Then for any strategy S(b),\nwe have\n\nThus, for any two window sizes b and b0, we have\n\n|(Vt+1(b)  Vt(b))  (H(pt  pt1))|\uf8ff G.\n\n|(Vt+1(b)  Vt(b))  (Vt+1(b0)  Vt(b0))|\uf8ff 2G.\n\nPROOF:[Sketch] Since |pt  pt1|\uf8ff , each strategy trades at most  shares, at prices between\npt1 and pt. Next, by Corollary 1, for any strategy |Ht(b)  H|\uf8ff B. Using these bounds, and the\nde\ufb01nitions of the total value, some calculations give the stated bounds. 2\n\n5\n\n\f4 A low regret meta-algorithm\n\nRecall that we have a set B of N window sizes b, all bounded by B. We want to design a low-regret\nalgorithm that achieves almost as much payoff as that of the best strategy S(b) for b 2B .\nConsider the following meta-algorithm. Treat every strategy S(b) as an expert and run a regret min-\nimizing algorithm for learning with expert advice (such as Multiplicative Weights [18] or Follow-\nThe-Perturbed-Leader [17]). The distributions generated by the regret minimizing algorithm are\ntreated as mixing weights for the different strategies, essentially executing each strategy scaled by\nits current weight. In each round, the meta-algorithm restores the inventory of each strategy to the\ncorrect state by additionally buying or selling enough shares so that its inventory is exactly what\nit would have been had it run the different strategies with their present weights throughout. The\nspeci\ufb01c algorithm is given below.\n\nAlgorithm 2 Low regret meta-algorithm\n1: Run every strategy S(b) in parallel so that at the end of each time period t, all trades made by\nthe strategies and the vectors Ht+1, Ct+1 and Vt+1 2 RN can be computed.\n2: Start a regret-minimizing algorithm A for learning from expert advice with one expert corre-\nsponding to each strategy S(b) for b 2B . Let the distribution over strategies generated by A at\ntime t be wt.\n\n3: for t = 1, 2, . . . , T do\n4:\n\nExecute any market orders from the previous period at the current market price pt so that the\ninventory now equals Ht \u00b7 wt. The cash value changes by (Ht \u00b7 (wt  wt1))pt.\nExecute any limit orders from the previous period: a wt weighted combination of the limit\norders of the strategies S(b). The holdings change to Ht+1 \u00b7 wt, and the cash value changes\nby (Ct+1  Ct) \u00b7 wt.\nFor each strategy S(b) for b 2B , set its payoff in round t to be Vt+1(b)  Vt(b) and send\nthese payoffs to A.\nObtain the updated distribution wt+1 from A.\nPlace a market order to buy Ht+1\u00b7(wt+1wt) shares in the next period, and a wt+1 weighted\ncombination of the limit orders of the strategies S(b).\n\n5:\n\n6:\n\n7:\n8:\n\n9: end for\n\nWe now prove the following bound on the regret of the algorithm based on the regret of the under-\nlying algorithm A. Recall from Lemma 4 the de\ufb01nition of G := 2B + 2.\nTheorem 2 Assume that the price sequence has -bounded volatity. The regret of the meta-\nalgorithm is bounded by\n\nG\n2\n\nkwt  wt+1k1.\n\nTXt=1\nRegret(A) +\nPROOF: The regret bound for A implies thatPT\nt=1(Vt+1 Vt)\u00b7 wt  maxb2B VT (b)Regret(A).\nLemma 5 shows that the \ufb01nal total value of the meta-algorithm is at leastPT\nt=1(Vt+1  Vt) \u00b7 wt \n2 PT\nt=1 kwt  wt+1k1. Thus, the regret of the algorithm is bounded as stated. 2\n\nLemma 5 In round t, the change in total value of the meta-algorithm equals\n\nG\n\n(Vt+1  Vt) \u00b7 wt + Ht \u00b7 (wt  wt1)(pt1  pt).\n\nFurthermore, |Ht \u00b7 (wt  wt1)(pt1  pt)|\uf8ff G\nPROOF:[Sketch] The expression for the change in the total value of the meta-algorithm is a simple\ncalculation using the de\ufb01nitions. The second bound is obtained by noting that all the Ht(b)\u2019s are\nwithin B of each other by Corollary 1, and thus |Ht \u00b7 (wt  wt1)|\uf8ff Bkwt  wt1k1, and\n|pt1  pt|\uf8ff  by the bounded volatility assumption. 2\n\n2 kwt  wt+1k1.\n\n6\n\n\f4.1 A low regret algorithm based on Mutiplicative Weights\n\nNow we give a low regret algorithm based on the classic Multiplicative Weights (MW) algo-\nrithm [18]. Call this algorithm MMMW (Market Making using Multiplicative Weights).\nThe algorithm takes parameters \u2318t, for t = 1, 2, . . . , T . It starts by initializing weights w1(b) = 1/N\nfor every b 2B . In round t, the algorithm updates the weights using the rule\nwt+1(b) := wt(b) exp(\u2318t(Vt+1(b)  Vt(b)))/Zt,\n\nfor every b 2B , where Zt is the normalization constant to make wt+1 a distribution.\nUsing Theorem 2, we can give the following bound on the regret of MMMW:\n\nTheorem 3 Suppose we set \u2318t = 1\n\n2G min\u21e2q log(N )\n\n, 1, for t = 1, 2, . . . , T . Then MMMW has\nregret bounded by 13Gplog(N )T .\nPROOF:[Sketch] By Theorem 2, we need to bound kwt+1  wtk1. The multiplicative update rule,\nwt+1(b) = wt(b) exp(\u2318t(Vt+1(b)  Vt(b)))/Zt, and the fact that by Lemma 4, the range of the\nentries of Vt+1  Vt is bounded by 2G implies that kwt+1  wtk1 \uf8ff 4\u2318tG. Standard analysis for\nthe regret of the MW algorithm then gives the stated regret bound for MMMW. 2\n\nt\n\n4.2 A low regret algorithm based on Follow-The-Perturbed-Leader\n\nNow we give a low regret algorithm based on the Follow-The-Perturbed-Leader (FPL) algo-\nrithm [17]. Call this algorithm MMFPL (Market Making using Follow-The-Perturbed-Leader). We\nactually use a deterministic version of the algorithm which has the same regret bound.\nThe algorithm requires a parameter \u2318. For every b 2B , let p(b) be a sample from the exponential\ndistribution with mean 1/\u2318. The distribution wt is then set to be the distribution of the \u201cperturbed\nleader\u201d, i.e.\n\n[Vt(b) + p(b)  Vt(b0) + p(b0) 8 b0 2B ].\nUsing Theorem 2, we can give the following bound on the regret of MMFPL:\n\nwt(b) = Pr\np\n\n2Gq log(N )\n\nTheorem 4 Choose \u2318 = 1\n\n. Then the regret of MMFPL is bounded by 7Gplog(N )T .\nPROOF:[Sketch] Again we need to bound kwt+1  wtk1. Kalai and Vempala [17] show that in the\nrandomized FPL algorithm, probability that the leader changes from round t to t + 1 is bounded by\n2\u2318G. This implies that kwt+1  wtk1 \uf8ff 4\u2318G. Standard analysis for the regret of the FPL algorithm\nthen gives the stated regret bound for MMFPL. 2\n\nT\n\n5 Experiments\n\nWe conducted experiments with stock price data obtained from http://www.netfonds.no/.\nWe downloaded data for the following stocks: MSFT, HPQ and WMT. The data consists of trades\nmade throughout a given date in chronological order. We obtained data for these stocks for each of\nthe 5 days in the range May 6-10, 2013. The number of trades ranged from roughly 7,000 to 38,000.\nThe quoted prices are rounded to the nearest cent. Our spread-based strategies operate at the level of\na cent: i.e. the windows are speci\ufb01ed in terms of cents, and the buy/sell orders are set to 1 share per\ncent outside the window. The class of spread-based strategies we used in our experiments correspond\nto the following set of window sizes, quoted in cents: B = {1, 2, 3, 4, 5, 10, 20, 40, 80, 100}, so that\nN = 10 and B = 100.\nWe implemented MMMW, MMFPL, simple Follow-The-Leader2 (FTL), and simple uniform av-\neraging over all strategies. We compared their performance to the best strategy in hindsight. For\nMMFPL, wt was approximated by averaging 100 independently drawn initial perturbations.\n\n2This algorithm simply chooses the best strategy in each round based on past performance without pertur-\n\nbations.\n\n7\n\n\fSymbol\nHPQ\nHPQ\nHPQ\nHPQ\nHPQ\nMSFT\nMSFT\nMSFT\nMSFT\nMSFT\nWMT\nWMT\nWMT\nWMT\nWMT\n\nDate\n\n05/06/2013\n05/07/2013\n05/08/2013\n05/09/2013\n05/10/2013\n05/06/2013\n05/07/2013\n05/08/2013\n05/09/2013\n05/10/2013\n05/06/2013\n05/07/2013\n05/08/2013\n05/09/2013\n05/10/2013\n\nT\n7128\n13194\n12016\n14804\n14005\n29481\n34017\n38664\n34386\n27641\n8887\n11309\n12966\n10431\n9567\n\nBest\n668.00\n558.00\n186.00\n1058.00\n512.00\n1072.00\n1260.00\n2074.00\n1813.00\n1236.00\n929.00\n1333.00\n1372.00\n2415.00\n1150.00\n\nMMMW\n370.07\n620.18\n340.11\n890.99\n638.53\n1062.65\n1157.38\n2064.83\n1802.91\n1250.27\n694.48\n579.88\n1300.47\n2329.78\n1001.31\n\nMMFPL\n433.99\n-41.54\n-568.04\n327.05\n-446.42\n-1547.01\n1048.46\n1669.30\n1534.68\n556.08\n760.70\n995.43\n832.80\n1882.90\n\n7.03\n\nFTL\n638.00\n19.00\n-242.00\n214.00\n-554.00\n-1300.00\n1247.00\n2074.00\n1811.00\n590.00\n785.00\n918.00\n974.00\n1991.00\n209.00\n\nUniform\n301.10\n100.80\n-719.80\n591.40\n345.60\n542.60\n63.80\n939.10\n656.10\n750.90\n235.20\n535.40\n926.40\n1654.10\n707.70\n\nTable 1: Final performance of various algorithms in cents. Bolded values indicate best performance.\nItalicized values indicate runs where the MMMW algorithm beat the best in hindsight.\n\nFigure 1: Performance of various algorithms and strategies for HPQ on May 8 and 9, 2013. For\nclarity, the total value every 100 periods is shown. Top row: On May 8, MMMW outperforms the\nbest strategy, and on May 9 the reverse happens. Bottom row: performance of different strategies.\nOn May 8, b = 100 performs best, while on May 9, b = 40 performs best.\n\n, 1\n\nt\n\nExperimentally, having slightly larger learning rates seemed to help. For MMMW, we used the\n\nGt, where Gt = max\u2327\uf8fft,b,b02B |V\u2327 (b)  V\u2327 (b0)|, and for\n\nspeci\ufb01cation \u2318t = min\u21e2q log(N )\nMMFPL, we used the speci\ufb01cation \u2318 =q log(N )\nthrough and the regret is bounded by O(pT ) as before.\nTable 5 shows the performance of the algorithms in the 15 runs (3 stocks times 5 days). In all the\nruns, the MMMW algorithm performed nearly as well as the best strategy, at times even outperform-\ning it. MMFPL didn\u2019t perform as well however. As an illustration of how closely MMMW tracks\nthe best performance achievable using the spread-based strategies in the class, in Figure 5 we show\nthe performance of all algorithms for 2 consecutive trading days, May 8 and 9, 2013, for the stock\nHPQ. We also show the performance of different strategies on these two days - it can be seen that the\nbest strategy differs, thus motivating the need for an adaptive learning algorithm.\n\n. These speci\ufb01cations ensures that the theory goes\n\nT\n\n8\n\n\fReferences\n[1] J. Abernethy, Y. Chen, and J. Wortman Vaughan. An optimization-based framework for auto-\nmated market-making. In Proceedings of the 12th ACM Conference on Electronic Commerce,\npages 297\u2013306, 2011.\n\n[2] S. Agrawal, E. Delag, M. Peters, Z. Wang, and Y. Ye. A uni\ufb01ed framework for dynamic\n\nprediction market design. Operations research, 59(3):550\u2013568, 2011.\n\n[3] A. Blum and A. Kalai. Universal portfolios with and without transaction costs. Machine\n\nLearning, 35(3):193\u2013205, 1999.\n\n[4] T. Chakraborty and M. Kearns. Market making and mean reversion. In Proceedings of the 12th\n\nACM conference on Electronic commerce, pages 307\u2013314. ACM, 2011.\n\n[5] Y. Chen and D. M. Pennock. A utility framework for bounded-loss market makers. In Pro-\nceedings of the 23rd Conference on Uncertainty in Arti\ufb01cial Intelligence, pages 49\u201356, 2007.\n[6] Y. Chen and J. Wortman Vaughan. A new understanding of prediction markets via no-regret\nlearning. In Proceedings of the 11th ACM Conference on Electronic Commerce, pages 189\u2013\n198, 2010.\n\n[7] T. M. Cover and E. Ordentlich. Universal portfolios with side information. IEEE Transactions\n\non Information Theory, 42(2):348\u2013363, 1996.\n\n[8] S. Das. A learning market-maker in the glosten\u2013milgrom model. Quantitative Finance,\n\n5(2):169\u2013180, 2005.\n\n[9] S. Das. The effects of market-making on price dynamics. In Proceedings of the 7th Inter-\nnational Joint Conference on Autonomous Agents and Multiagent Systems, pages 887\u2013894,\n2008.\n\n[10] S. Das and M. Magdon-Ismail. Adapting to a market shock: Optimal sequential market-\nIn Proceedings of the 21th Annual Conference on Neural Information Processing\n\nmaking.\nSystems, pages 361\u2013368, 2008.\n\n[11] L. R. Glosten and P. R. Milgrom. Bid, ask and transaction prices in a specialist market with\n\nheterogeneously informed traders. Journal of \ufb01nancial economics, 14(1):71\u2013100, 1985.\n\n[12] R. Hanson. Combinatorial information market design.\n\n5(1):105\u2013119, 2003.\n\nInformation Systems Frontiers,\n\n[13] R. Hanson. Logarithmic market scoring rules for modular combinatorial information aggrega-\n\ntion. Journal of Prediction Markets, 1(1):3\u201315, 2007.\n\n[14] E. Hazan, A. Kalai, S. Kale, and A. Agarwal. Logarithmic regret algorithms for online convex\n\noptimization. In Learning Theory, pages 499\u2013513. Springer, 2006.\n\n[15] D. P. Helmbold, R. E. Schapire, Y. Singer, and M. K. Warmuth. On-line portfolio selection\n\nusing multiplicative updates. Mathematical Finance, 8(4):325\u2013347, 1998.\n\n[16] A. T. Kalai and S. Vempala. Ef\ufb01cient algorithms for universal portfolios. The Journal of\n\nMachine Learning Research, 3:423\u2013440, 2003.\n\n[17] A. T. Kalai and S. Vempala. Ef\ufb01cient algorithms for online decision problems. J. Comput.\n\nSyst. Sci., 71(3):291\u2013307, 2005.\n\n[18] N. Littlestone and M. K. Warmuth. The weighted majority algorithm.\n\n108(2):212\u2013261, 1994.\n\nInf. Comput.,\n\n[19] N. Della Penna and M. D. Reid. Bandit market makers. arXiv preprint arXiv:1112.0076, 2011.\n[20] T. M. Cover. Universal portfolios. Mathematical Finance, 1(1):1\u201329, January 1991.\n\n9\n\n\f", "award": [], "sourceid": 1032, "authors": [{"given_name": "Jacob", "family_name": "Abernethy", "institution": "University of Pennsylvania"}, {"given_name": "Satyen", "family_name": "Kale", "institution": "IBM Research"}]}