{"title": "Coastal Navigation with Mobile Robots", "book": "Advances in Neural Information Processing Systems", "page_first": 1043, "page_last": 1049, "abstract": null, "full_text": "Coastal Navigation with Mobile Robots \n\nNicholas Roy and Sebastian Thrun \n\nSchool of Computer Science \nCarnegie Mellon University \n\nPittsburgh, PA  15213 \n\n{ nicholas. roy I sebastian. thrun } @cs.cmu.edu \n\nAbstract \n\nThe  problem  that  we  address  in  this paper  is  how  a mobile robot can  plan  in  order \nto  arrive  at  its  goal  with  minimum  uncertainty.  Traditional  motion  planning algo(cid:173)\nrithms often assume that a mobile robot can track its position reliably, however, in real \nworld situations, reliable localization may not always be feasible.  Partially Observable \nMarkov Decision Processes (POMDPs) provide one way to maximize the certainty of \nreaching  the goal  state,  but at the cost of computational  intractability for large state \nspaces. \nThe  method  we propose  explicitly  models  the  uncertainty of the  robot's position as \na  state  variable,  and  generates  trajectories  through  the  augmented  pose-uncertainty \nspace.  By  minimizing  the  positional  uncertainty  at  the  goal,  the  robot reduces  the \nlikelihood  it  becomes  lost.  We  demonstrate  experimentally  that  coastal  navigation \nreduces the uncertainty at the goal, especially with degraded localization. \n\n1  Introduction \n\nFor an  operational  mobile  robot,  it  is  essential  to  prevent  becoming  lost.  Early  motion \nplanners  assumed  that a  robot would never  be  lost - that a  robot could  always  know  its \nposition via dead reckoning without error [7]. This assumption proved to be untenable due \nto the small and inevitable inconsistencies in actual robot motion; robots that rely solely on \ndead reckoning for their position estimates lose their position quickly.  Mobile robots now \nperform position tracking using a combination of sensor data and odometry [2,  10, 5]. \nHowever,  the  robot's ability  to  track  its  position can  vary  considerably  with  the  robot's \nposition in the environment. Some parts of the environment may  lack good features for lo(cid:173)\ncalization [11]. Other parts of the environment can have a large number of dynamic features \n(for example,  people) that can  mislead the localization system.  Motion planners rarely,  if \never,  take the robot's position tracking ability into consideration.  As  the robot's localiza(cid:173)\ntion suffers, the likelihood that the robot becomes lost increases, and as a consequence, the \nrobot is less likely to complete the given trajectory. \nMost  localization  systems  therefore  compensate  by  adding  environment-specific  knowl(cid:173)\nedge to the localization  system,  or by  adding additional sensing capabilities to the robot, \nto  guarantee  that the  robot can  complete  every  possible path.  In  general,  however,  such \nalterations to the position tracking abilities of the robot have limitations, and an alternative \nscheme must be used to ensure that the robot can  navigate with maximum reliability.  The \nconventional planners represent one end of a spectrum of approaches  (figure  1),  in  that a \nplan can be computed easily, but at the cost of not modelling localization performance. \nAt  opposite  end  of the  spectrum  is  the  Partially  Observable  Markov  Decision  Process \n\n\f1044 \n\nN  Roy and S.  Thntn \n\nConventional \n\nPath Planner \n\nTractable \n\nNot Robust \n\nPOMDP \n\nIntracwble \n\nRobust \n\nFigure 1:  The continuum of possible approaches to  the  motion  planning,  from  the  robust but  in(cid:173)\ntractable POMDP,  to the potentially failure-prone but real-time conventional planners.  Coastal navi(cid:173)\ngation lies in the middle of this spectrum. \n\n(POMDP). POMDPs in a sense are the brass ring of planning with uncertainty; a POMDP \npolicy  will make exactly  the right kind of compromise  between  conventional optimality \nconsiderations and  certainty of achieving the goal state.  Many people have  examined  the \nuse  of POMDPs  for  mobile robot navigation  [5,  6,  8].  However,  computing  a  POMDP \nsolution is  computationally intractable (PSPACE-hard) for large state systems - a mobile \nrobot operating in  the real  world often has  millions of possible states.  As  a result,  many \nof the mobile robot POMDP solutions have made simplifying assumptions about the world \nin  order to reduce  the  state space size.  Many  of these assumptions do not scale to larger \nenvironments  or robots.  In  contrast,  our hypothesis is  that  only  a  small  number of the \ndimensions of the uncertainty matter, and that we can augment the state with these dimen(cid:173)\nsions to approximate a solution to the POMDP. \nThe coastal navigation model developed in  this paper represents a tradeoff between robust \ntrajectories and computational tractability, and is inspired by traditional navigation of ships. \nShips often use the coasts of continents for navigation in the absence of better tools such \nas GPS, since being close to the land allows sailors to determine with high accuracy  where \nthey are.  The success of this method results from coast lines containing enough information \nin their structure for accurate localization.  By navigating sufficiently close to areas  of the \nmap that have high information content, the likelihood of getting lost can be minimized. \n\n2  Modelling Uncertainty \n\nThe problem that we address in this paper is how a mobile robot can plan in order to arrive \nat its goal  with minimum uncertainty.  Throughout this discussion, we  will be assuming a \nknown map of the environment [9].  The position, x, of the robot is  given as  the location \n(x, y)  and  direction e,  defined  over a space X  =  (X, Y, 8).  Our localization method  is \na  grid-based  implementation  of Markov  localization  [3,  5].  This  method  represents  the \nrobot's belief in its current position using a 3-dimensional grid over X  =  (X, Y, 8), which \nallows for a discrete approximation of arbitrary probability distributions.  The probability \nthat the robot has a particular pose x is given by the probability p(x). \n\nState Augmentation  We can  extend  the state of the robot from  the 3-dimensional pose \nspace  to  an  augmented  pose-uncertainty  space.  We  can  represent  the uncertainty  of the \nrobot's positional distribution as the entropy, \n\nH(Px ) = - J p(x) log(p(x))  dx \n\nx \n\n(1) \n\nWe therefore represent the state space of the robot as the tuple \n\nS \n\n(x,y,e,H(x,y,e)) \n(x, H(x)) \n\nState Transitions \nIn  order to  construct a  plan between  two points in  the environment, \nwe need to be able to represent the effect of the robot's sensing and moving actions.  The \nimplementation of Markov localization provides the following equations for the  tracking \n\n\fCoastal Navigation with Mobile Robots \n\nthe robot's pose from x  to x': \n\np(x'lu) \n\np(x'lz) \n\nJ p(x'lx, u)p(x)dx \n\nx \nap(zlx)p(x) \n\n1045 \n\n(2) \n\n(3) \n\nThese equations are  taken  from  [3,  12],  where equation  (2)  gives  the prediction phase of \nlocalization (after motion u), and equation (3) gives the update phase of localization (after \nreceiving observation z).  a  is  a normalizing constant.  We  extend  these equations to  the \nfourth dimension as follows: \n\np(slu) \np(slz) \n\n(p(xlu),ll(p(xlu))) \n(p(xlz), ll(p(xlz))) \n\n(4) \n(5) \n\n3  Planning \n\nEquations (4) and (5) provide a mechanism for tracking the robot's state, and in fact contain \nredundant information, since the extra state variable ll(x) is also contained in the probabil(cid:173)\nity distribution p(x). However, in order to make the planning problem tractable, we cannot \nin fact maintain the probabilistic sensing model.  To do so would put the planning problem \nfirmly in the domain ofPOMDPs, with the associated computational intractability. Instead, \nwe  make  a simplifying assumption,  that  is,  that  the  positional probability distribution of \nthe robot can be represented at all times by a Gaussian centered at the mean x. This allows \nus to approximate the positional distribution with a single statistic, the entropy. In POMDP \nterms, we using the assumption of Gaussian distributions to compress the belief space to a \nsingle dimension.  We  can now represent the positional probability distribution completely \nwith the vector s, since the width of the Gaussian is represented by the entropy ll(x). \nMore importantly, the simplifying assumption allows us  to track the state of the robot de(cid:173)\nterministically.  Although the state transitions are stochastic (as in equation (4\u00bb, the obser(cid:173)\nvations are not.  At any  point in time, the sensors identify the true state of the system, with \nsome certainty given by II (p(xlz)). This allows us  to compress the state transitions into a \nsingle rule: \n\np(slu) \n\n(p(xlu),ll(p(xlu,z))) \n\n(6) \n\nThe final position of the robot depends only on the motion command 1l and can be identified \nby  sensing z.  However,  the uncertainty of the pose, ll(p(xlll, z)), is  a function not only \nof the  motion  command  but  also  the  sensing.  The  simplifying assumption  of Gaussian \nmodels is in general untenable for localization; however,  we shall see that this assumption \nis sufficient for the purpose of motion planning. \nOne  final  modification  must  be  made  to  the  state  transition  rule.  In  a  perfect  world,  it \nwould  be  possible  to  predict  exactly  what  observation  would  be  made.  However,  it  is \nexactly  the  stochastic and  noisy  nature of real  sensors  that  generates  planning difficulty, \nyet  the  update  rule  (6)  assumes  that  it  is  possible  to  predict  measurement  z  at  pose  x. \nDeterministic prediction  is  not possible;  however,  it  is  possible to compute probabilities \nfor sensor measurements, and thus generate an  expected value for the entropy based on the \nprobability distribution of observations Z,  which leads to the final  state transition rule: \n\np(slu) \n\n(p(xlu), Ez[ll(p(xlu , z))]) \n\n(7) \n\nwhere Ez[ll(p(xlll, z))] represents the expected value of the entropy of the pose distribu(cid:173)\ntion over the space of possible sensor measurements. \nWith  the  transition rule in  equation  (7),  we  can  now  compute  the transition probabilities \nfor any  particular state using a model  of the robot's motion, a model of the robot's sensor \nand a map  of th~ environment.  The probability p(xlu)  is given by  a model  of the robot's \nmotion,  and can  be  easily precomputed  for each  action  u.  The expectation  term  Ez [ll] \n\n\f1046 \n\nN.  Roy and S.  Thrun \n\ncan also be precomputed for each possible state s.  The precomputation of these transition \nprobabilities is very time-intensive, because it requires simulating sensing at each  state in \nthe environment, and then computing the posterior distribution.  However,  as  the precom(cid:173)\nputation is  a one-time operation for the environment and robot, planning itself can  be an \nonline operation  and  is (in the limit) unaffected by  the speed of computing the transition \nprobabilities. \n\n3.1  Computing Trajectories \n\nWith the state update rule given in equation (7), we can now compute the optimal trajectory \nto a particular goal.  We would in fact like to compute not just the optimal trajectory from \nthe current robot position, but the optimal action from any position in the world. If the robot \nshould deviate from the expected trajectory for any reason (such as error in  the motion, or \ndue to low-level control constraints), interests of efficiency  suggest precomputing actions \nfor continuing to the goal, rather than continually replanning as  these contingencies arise. \nNote that  the  motion  planning problem as  we  have  now  phrased  it can  be  viewed  as  the \nproblem of computing the optimal policy for a given problem.  The Markovian, stochastic \nnature of the transitions, coupled with the need to compute the optimal policy for all states, \nsuggests a value iteration approach. \nValue iteration attempts to find  the policy that maximizes the long-term reward  [1,4]. The \nproblem becomes  one  of finding  the  value function,  J(s)  which  assigns  a  value to  each \nstate.  The  optimal  action  at  each  state can  then  be  easily  computed  by  determining the \nexpected value of each action at each state, from the neighboring values.  We use a modified \nform of Bellman's equations to give the value of state J (s)  and policy as \nm:x[R(sd + C(s, u) + L p(Sj lSi, u)  . J(Sj)] \n\nJ(Si) \n\n(8) \n\nN \n\nj=1 \n\nargmax[R(si) + C(s, u) + L p(Sj lSi, u)  . J(Sj)] \n\nN \n\nIl \n\nj=1 \n\n(9) \n\nBy iterating equation (8), the value function iteratively settles to a converged value over all \nstates. Iteration stops when no state value changes above some threshold value. \nIn the above equations, R(sd is the immediate reward at state si, p(Sj lSi , u) is the transition \nprobability from state si to state Sj, and C(s, u) is the cost of taking action u at state s.  Note \nthat the form of the equations is undiscounted in the traditional sense, however, the additive \ncost term plays a similar role in  that the system is penalized for policies that take  longer \ntrajectories. The cost in general is simply the distance of one step in the given direction u, \nalthough the cost of travel  close  to obstacles  is higher,  in  order to create a  safety  margin \naround obstacles.  The cost of an  action that would cause a collision is infinite, preventing \nsuch actions from being used. \nThe immediate reward  is  localized  only at  the goal  pose.  However,  the  goal  pose has  a \nrange of possible values for the uncertainty,  creating a  set of goal  states,  g.  In  order to \nreward  policies  that arrive  at  a goal  state  with  a  lower uncertainty,  the reward  is  scaled \nlinearly with goal state uncertainty. \n\nR( xd = {~ - H (s) \n\nS  (;  9 \notherwise \n\n(10) \n\nBy implementing the value iteration given in  the equations (8) and  (9)  in  a dynamic  pro(cid:173)\ngram, we can compute the value function in O( nkcrid where n  is the number of states in \nthe  environment (number of positions  x  number of entropy levels)  and  kcrit  is  the num(cid:173)\nber of iterations to convergence.  With  the value function  computed,  we can  generate the \noptimal  action  for  any  state in  O(a)  time,  where a is  the number of actions out of each \nstate. \n\n\fCoastal Navigation with Mobile Robots \n\n1047 \n\n4  Experimental Results \n\nFigure 2 shows the mobile robot, Minerva, used for this research.  Minerva is a RWI B-18, \nand senses using a 360 0  field of view laser range finder at  10  increments. \n\nFigure 2:  Minerva, the B-18 mobile robot used for this research, and an example environment map, \nthe Smithsonian National Museum of American History.  The black areas are the walls and obstacles. \nNote the large sparse areas in the center of the environment. \n\nAlso shown in figure  2 is  an  example environment,the Smithsonian National Museum of \nAmerican History.  Minerva was used to generate this map, and operated as a tour-guide in \nthe museum for two weeks in  the summer of 1998. This museum has  many of the features \nthat make localization difficult -large open spaces,  and many dynamic obstacles (people) \nthat can mislead the sensors. \n\nStartillg Positioll \n\nStart POSitiOIl \n\n~ ~~  ~ .,.~ \nI \n\n~\n\n. \n\n. .-A.  ~ \n\n(a) Conventional \n\n(b) Coastal \n\n(c) Sensor Map \n\nFigure 3:  Two examples in the museum environment.  The left trajectory is given by a conventional, \nshortest-path planner.  The middle trajectory  is  given  by  the coastal navigation planner.  The black \nareas correspond to obstacles, the dark grey areas correspond to  regions where sensor infonnation is \navailable, the light grey areas to regions where no sensor infonnation is available. \n\nFigure 3 shows the effect of different planners in the sample environment.  Panel (a) shows \nthe  trajectory of a conventional,  shortest distance planner.  Note that the robot moves  di-\n\n\f1048 \n\nN.  Roy and S.  Thrun \n\nrectly towards the goal.  Panel (b) shows the trajectory given by the coastal planner. In both \nexamples, the robot moves towards an obstacle, and relocalizes once it is in sensor range of \nthe obstacle, before moving towards the goal.  These periodic relocalizations are essential \nfor the robot to arrive at the goal with minimum positional uncertainty, and maximum reli(cid:173)\nability. Panel (c) shows the sensor map of the environment. The black areas show obstacles \nand walls, and the light grey areas are where no information is available to the sensors, be(cid:173)\ncause all environmental features  are outside the range of the sensors.  The dark grey areas \nindicate areas  where the information gain from the sensors is not zero;  the darker grey the \narea, the better the information gain from the sensors. \n\n20 \n18 \n16 \n14 \n~  12 \nt;I \n\"  to \n,., \ne \nJl \n\n8 \n6 \n4 \n2 \n0 \n\u00b72 \n\n0 \n\nPositional Uncenainty at Goal \n\nConventional Navigation  -(cid:173)\ncoastal  Navigation  -\n\n! ...... + .......... +- ......... .1 ............. I ............... J \n\n0.5 \n\nI \n\n1.5 \n\n2 \n\n2.5 \n\n3 \n\n3.5 \n\n4 \n\n4.5 \n\n5 \n\n5.5 \n\nFigure 4:  The performance of the coastal navigation algorithm compared to the coastal motion plan(cid:173)\nner.  The graph depicts the entropy of the position  probability distribution  against the  range of the \nlaser sensor. Note that the coastal navigation dramatically improves the certainty of the goal position \nwith shorter range laser sensing. \n\nFigure 4  is  a comparison of the  average  positional certainty (computed as  entropy of the \npositional probability) of the robot at its goal position, compared to the range of the laser \nrange  sensor.  As  the range  of the  laser range  gets  shorter,  the  robot can  see  fewer  and \nfewer environmental features - this is essentially a way of reducing the ability of the robot \nto  localize  itself.  The  upper line is  the  performance  of a  conventional  shortest-distance \npath planner,  and  the  lower line is  the  coastal  planner.  The coastal  planner has  a  lower \nuncertainty for all ranges of the laser sensor,  and  is  substantially lower at shorter ranges, \nconfirming that the coastal navigation has the most effect when the localization is worst. \n\n5  Conclusion \n\nIn this paper, we have described a particular problem of motion planning - how to guarantee \nthat  a  mobile robot can  reach  its  goal  with  maximum  reliability.  Conventional  motion \nplanners do not typically plan according to the ability of the  localization unit in  different \nareas  of the environment, and  thus make no claims about the robustness of the generated \ntrajectory.  In  contrast,  POMDPs  provide  the  correct  solution  to  the  problem  of robust \ntrajectories, however, computing the solution to a POMDP is intractable for the size of the \nstate space for typical mobile robot environments. \nWe  propose  a  motion  planner  with  an  augmented  state  space  that  represents  positional \nuncertainty explicitly as an  extra dimension. The motion planner then plans through pose(cid:173)\nuncertainty space,  to arrive at the goal pose with the lowest possible uncertainty.  This can \nbe seen  to be an  approximation to a POMDP where the multi-dimensional belief space is \nrepresented as a subset of statistics, in this case the entropy of the belief space. \nWe  have  shown some experimental comparisons with a conventional motion planner.  Not \nonly  did  the  coastal  navigation generated  trajectories  that  provided substantial  improve(cid:173)\nment of the positional certainty at the goal compared  to the conventional planner,  but the \nimprovement became more pronounced as the localization was degraded. \n\n\fCoastal Navigation with Mobile Robots \n\n1049 \n\nThe model presented here, however, is not complete. The entire methodology hinges upon \nthe  assumption  that  the robot's probability distribution can  be  adequately represented  by \nthe  entropy  of the  distribution.  This  assumption  is  valid  if the  distribution is  restricted \nto  a  uni-modal  Gaussian,  however,  most  Markov  localization methods  that are  based  on \nthis assumption fail,  because multi-modal, non-Gaussian positional distributions are quite \ncommon  for  moving  robots.  Nonetheless,  it  may  be  that  multiple  uncertainty  statistics \nalong multiple dimensions (e.g.,  x  and  y) may  do a better job of capturing the uncertainty \nsufficiently.  It  is  an  question  for  future  work  as  to  how  many  statistics  can  capture  the \nuncertainty of a mobile robot, and under what environmental conditions. \nAcknowledgments \nThe  authors  gratefully  acknowledge the  advice  and collaboration of Tom Mitchell  throughout the \ndevelopment of this  work.  Wolfram Burgard and Dieter Fox played an instrumental role in the de(cid:173)\nvelopment of earlier versions of this work, and their involvement and discussion of this new model is \nmuch appreciated.  This work was partially funded by the Fonds pour la Formation de Chercheurs et \nl' Aide a la Recherche (FCAR). \n\nReferences \n\n[1]  R.  Bellman.  Dynamic Programming.  Princeton University Press, NJ,  1957. \n[2]  w. Burgard, D.  Fox, D.  Hennig, and T.  Schmidt.  Estimating the absolute position of a mobile \n\nrobot using position probability grids.  In AAAI,  1996. \n\n[3]  D. Fox, W. Burgard, and S. Thrun.  Active Markov localization for mobile robots.  Robotics and \n\nAutonomous Systems, 25(3-4),  1998. \n\n[4]  R.  A. Howard.  Dynamic Programming and Markov Processes. MIT,  1960. \n[5]  L. Kaelbling, A.  R.  Cassandra, and J.  A.  Kurien.  Acting under uncertainty:  Discrete Bayesian \n\nmodels for mobile-robot navigation.  In IROS,  1996. \n\n[6]  S. Koenig and R. Simmons. The effect of representation and knowledge on goal-directed explo(cid:173)\nration with reinforcement learning algorithms.  Machine Learning Journal, 22:227-250,1996. \n\n[7]  J .-c. Latombe.  Robot Motion Planning.  Kluwer Academic Publishers, 1991. \n[8]  S.  Mahadevan  and  N.  Khaleeli.  Robust  mobile  robot  navigation  using  partially-observable \n\nsemi-Markov decision processes.  1999. \n\n[9]  H. P. Moravec and A. Elfes.  High resolution maps from wide angle sonar.  In ICRA,  1985. \n[10]  R.  Sim and G. Dudek. Mobile robot localization from learned landmarks.  In  lROS,  1998. \n[11]  H. Takeda, C. Facchinetti, and J.-c. Latombe. Planning the motions of mobile robot in a sensory \n\nuncertainty field.  IEEE Trans.  on Pattern Analysis and Machine Intelligence,  16(10),  1994. \n\n[12]  S. Thrun, D. Fox, and W.  Burgard.  A probabilistic approach to concurrent mapping and local(cid:173)\n\nization for mobile robots.  Machine Learning, 431,1998. \n\n\f", "award": [], "sourceid": 1763, "authors": [{"given_name": "Nicholas", "family_name": "Roy", "institution": null}, {"given_name": "Sebastian", "family_name": "Thrun", "institution": null}]}