Sections

  • Time, Schedules, and Resources - extends the classical language for planning to talk about actions with durations and resource constraints
  • Hierarchical Planning - describes methods for constructing plans that are organized hierarchically. This allows human experts to communicate to the planner what they know about how to solve the problem. Hierarchy also lends itself to efficient plan construction because the planner can solve a problem at an abstract level before delving into details
  • Planning and Acting in Non-deterministic Domains - presents agent architectures that can handle uncertain environments and interleave deliberation with execution, and gives some examples of real-world systems.
  • Multi-Agent Planning - shows how to plan when the environment contains other agents

Time, Schedules, and Resources

The classical planning representation talks about what to do and in what order, but the representation cannot talk about

  • temporal ordering constraintswhen an action should occur (before and/or after a specified time and/or specific action(s))
  • resource constraints - describes the resources needed for an action to be executed
  • the available resources - described below
Representing Temporal Constraints, Resource Constraints, and the Available Resources

each action is represented by:

  • a duration
  • a set of temporal ordering constraints (action(s) that must be completed before this action can be executed)
  • a set of resource constraints

each resource is represented by 3 things:

  • the type of resource (e.g. bolts, wrenches, or pilots)
  • the number of that resource available at start
  • whether that resource is:
    • consumable - e.g. the bolts are no longer available for use
    • reusable - e.g. a pilot is occupied during a flight but is available again when the flight is over
    • sharable

resources can be produced by actions

a solution must satisfy all the temporal ordering constraints of actions and resource constraints

Example Scheduling Problem (Assembly of 2 Cars)

the Assembly of 2 Cars problem consists of:

  • resources: 4 types of resources and number of each type available at the start:
    resource type number available consumable or reusable
    engine hoist 1 reusable
    wheel station 1 reusable
    lug nuts 500 consumable
    inspector 2 reusable
  • ordering & resource constraints: 2 jobs, each of form [AddEngine, AddWheels, Inspect] and their individual action durations and its resource constraints
    action duration units resource constraints
    AddEngine1 30 1 engine hoist
    AddEngine2 60 1 engine hoist
    AddWheels1 30 1 wheel station and 40 lug nuts
    AddWheels2 15 1 wheel station and 40 lug nuts
    Inspect1 10 1 inspector
    Inspect2 10 1 inspector
Solving Example Scheduling Problem (Assembly of Two Cars) - WITHOUT Resource Constraint

first we consider just the scheduling problem with only ordering constraints (ignoring resource constraints)

the goal is to minimize makespan (plan duration), in other words we must find the earliest start times for all the actions consistent with the ordering constraints

a set of ordering constraints can be represented as a directed graph (e.g. see figure 11.2)we can then apply the critical path method (CPM) to this graph to determine the possible start and end times of each action

path through a graph is a linearly ordered sequence of actions beginning with Start and ending with Finish (e.g. Figure 11.2 contains 2 paths)

a critical path is the path with the longest total duration of actions

ES(x) - earliest start time of action x
LS(x) - latest start time of action x
slack of an action x is the quantity LS(x) - ES(x)schedule - together the ES and LS times for all the actions constitute a schedule for the problem

for example, in Figure 11.2

  • the whole plan will take 85 minutes
  • each action in the top path has 15 minutes of slack
  • each action in the critical path has no slack

The following formulas serve as a definition for ES and LS. A and B are actions, and A ≺ B means that A comes before B:

  • ES(Start) = 0
  • ES(B) = maxA≺BES(A) + Duration(A)
  • LS(Finish) = ES(Finish)
  • LS(A) = minB≻ALS(B) − Duration(A)

The idea is that we start by assigning ES(Start) to be 0. Then, as soon as we get an action B such that all the actions that come immediately before B have ES values assigned, we set ES(B) to be the maximum of the earliest finish times of those immediately preceding actions, where the earliest finish time of an action is defined as the earliest start time plus the duration. This process repeats until every action has been assigned an ES value. The LS values are computed in a similar manner, working backward from the Finish action

The complexity of the CPM algorithm is just O(Nb), where N is the number of actions and b is the maximum branching factor into or out of an action

Solving Example Scheduling Problem (Assembly of Two Cars) - WITH Resource Constraint

the introduction of resource constraints turns the scheduling problem from easy to NP-hard

Figure 11.3 shows the solution with the fastest completion time, 115 minutes

minimum slack algorithm: on each iteration, schedule for the earliest possible start whichever unscheduled action has all its predecessors scheduled and has the least slack; then update the ES and LS times for each affected action and repeat. The heuristic resembles the minimum-remaining-values (MRV) heuristic in constraint satisfaction problems. It often works well in practice, but for our assembly problem it yields a 130 minute solution, not the 115 minute solution of Figure 11.3

Hierarchical Planning

hierarchical decomposition - the process of dividing a system into subsystems and thus reducing the number of activities at the next lower level

example hierarchical planning

A reasonable plan for the Hawaii vacation might be:

  • Go to San Francisco airport

  • take Hawaiian Airlines flight 11 to Honolulu

  • do vacation stuff for two weeks

  • take Hawaiian Airlines flight 12 back to San Francisco

  • go home

Given such a plan, the action “Go to San Francisco airport” can be viewed as a planning task in itself, with a solution such as:

  • Drive to the long-term parking lot

  • park

  • take the shuttle to the terminal

Each of these actions, in turn, can be decomposed further, until we reach the level of actions that can be executed without deliberation to generate the required motor control sequences

in this example, planning can occur both before and during the execution of the plan

High-Level Actions

hierarchical task networks (HTN) planning - the basic formulation of hierarchical decompositionin classical planning (Chapter 10), we assumed full observability and determinism and the availability of a set of actions, now called primitive actions, with standard precondition–effect schemas

high-level plan (HLP) - a sequence of high-level actionshigh-level action (HLA) - an action that has one or more possible refinements
refinements - a sequences of action(s) - a mix of high-level and primitive actions
primitive action - has no refinements

example of refinements in figure 11.4

  • Figure 11.4 TOP: the action “Go to San Francisco airport,” represented formally as Go (Home, SFO), might have 2 possible refinements
  • Figure 11.4 BOTTOM: recursive refinement for navigation in the vacuum world: to get to a destination, take a step, and then go to the destination

implementation of a high-level action - a high-level action that only contains primitive actions
implementation of a high-level plan - a concatenation of implementations of each HLA in the sequence

if a HLA that has exactly one implementation, we can compute the preconditions and effects of the HLA and then treat the HLA exactly as if it were a primitive action itself

Searching For Primitive Solutions

HTN planning is often formulated with a single “top level” action called Act, where the aim is to find an implementation of Act that achieves the goal

the approach leads to a simple algorithm (hierarchical forward planning search): repeatedly choose an HLA in the current plan and replace it with one of its refinements, until the plan achieves the goal
possible implementations of this algorithm:

  • breadth-first search (shown in figure 11.5)
  • depth-first search
  • iterative deepening

Benefits of Hierarchical Search

Suppose that a planning problem has a solution with d primitive actions.

For a nonhierarchical, forward state-space planner with b allowable actions at each state, the cost is O(bd), as explained in Chapter 3.

For an HTN planner, let us suppose a very regular refinement structure:

  • each nonprimitive action has r possible refinements, each into k actions at the next lower level.
  • we want to know how many different refinement trees there are with this structure.
  • now, if there are d actions at the primitive level, then the number of levels below the root is logk(d)
  • so the number of internal refinement nodes is 1 + k + k2+ · · · + klogk(d−1) = (d − 1)/(k − 1).
  • each internal node has r possible refinements, so r(d−1)/(k−1)possible regular decomposition trees could be constructed
  • examining this formula, we see that keeping r small and k large can result in huge savings
Searching For Abstract Solutions

downward refinement property for HLA descriptions - every high-level action that “claims” to achieve the goal does in fact achieve the goal (i.e. it must have at least one implementation that does achieve the goal)

a problem arises when the HLA has multiple implementations. How can we describe the effects of an high-level action that can be implemented in many different ways?

One safe answer (at least for problems where all preconditions and goals are positive) is to include only the positive effects that are achieved by every implementation of the HLA and the negative effects of any implementation. Then the downward refinement property would be satisfied. Unfortunately, this semantics for HLAs is much too conservative. Consider again the HLA Go(Home,SFO), which has two refinements, and suppose, for the sake of argument, a simple world in which one can always drive to the airport and park, but taking a taxi requires Cash as a precondition. In that case, Go(Home,SFO) doesn’t always get you to the airport. In particular, it fails if Cash is false, and so we cannot assert At(Agent,SFO) as an effect of the HLA. This makes no sense, however; if the agent didn’t have Cash, it would drive itself. Requiring that an effect hold for every implementation is equivalent to assuming that someone else—an adversary—will choose the implementation. It treats the HLA’s multiple outcomes exactly as if the HLA were a nondeterministic action, as in Section 4.3. For our case, the agent itself will choose the implement

demonic nondeterminism - where an adversary makes the choices
angelic nondeterminism - where the agent itself makes the choices

angelic semantics - the reachable set of an HLA

reachable set of a HLA - given a state s, the reachable set for an HLA h, written as REACH(s,h), is the set of states reachable by any of the HLA’s implementations

The key idea is that the agent can choose which element of the reachable set it ends up in when it executes the HLA; thus, an HLA with multiple refinements is more “powerful” than the same HLA with fewer refinements

reachable set of a sequence of HLAs - for example, the reachable set of a sequence [h1, h2] is the union of all the reachable sets obtained by applying h2 in each state in the reachable set of h1:

The notion of reachable sets yields a straightforward algorithm:

  • search among high-level plans/actions, looking for one whose reachable set intersects the goal
  • once that happens, the algorithm can commit to that abstract plan, knowing that it works
  • focus on refining the plan further

examples of reachable sets

Representation of Effects of HLA

effects of a HLA - is the reachable set for each possible previous state

as with the classical action schemas of Chapter 10, we represent the changes made to each fluent. Think of a fluent as a state variable.

a primitive action can either:

  • add a variable
  • delete a variable
  • leave it unchanged
  • flipping a variable to its opposite (with conditional effects (see Section 11.3.1))

a high-level action (HLA) under angelic semantics can do more:

  • it can control the value of a variable, setting it to true or false depending on which implementation is chosen
  • in fact, an HLA can have nine different effects on a variable:
    • if the variable starts out true, it can always keep it true, always make it false, or have a choice
    • if the variable starts out false, it can always keep it false, always make it true, or have a choice
    • and the three choices for each case can be combined arbitrarily, making nine

TODO page 431 of 3rd Edition PDF

Planning and Acting in Non-Deterministic Domains

In this section we extend planning to handle partially observable, nondeterministic, and unknown environments

Chapter 4 extended search in similar ways, and the methods here are also similar:

  • sensorless planning or conformant planning - for environments with no observations
  • contingency planning - for partially observable and nondeterministic environments
  • online planning and replanning - for unknown environments

While the concepts are similar to Chapter 4, there are significant differences because planners deal with factored representations rather than atomic representations

TODO page 434 of 3rd Edition PDF

Multi-Agent PlanningWhen there are multiple agents in the environment, each agent faces a multiagent planning problem in which it tries to achieve its own goals with the help or hindrance of others

An agent with multiple effectors that can operate concurrently needs to do multi-effector planning which manages several effectors simultaneously while handling positive and negative interactions among the effectorsWhen the effectors are physically decoupled into detached units —as in a fleet of delivery robots in a factory— multi-effector planning becomes multibody planning. A multibody problem is still a “standard” single-agent problem as long as the relevant sensor information collected by each body can be pooled—either centrally or within each body—to form a common estimate of the world state that then informs the execution of the overall plan; in this case, the multiple bodies act as a single body

When communication constraints make this impossible, we have what is sometimes called a decentralized planning problem; this is perhaps a misnomer, because the planning phase is centralized but the execution phase is at least partially decoupled. In this case, the subplan constructed for each body may need to include explicit communicative actions with other bodies. For example, multiple reconnaissance robots covering a wide area may often be out of radio contact with each other and should share their findings during times when communication is feasible

multi-body vs multi-agent

both multi-body and multi-agent planning has shared goals, however, the multibody and multiagent cases are quite different:

  • n a multi-body team
    • a single entity is doing the planning, there is really only one goal, which all the bodies necessarily share
    • a single plan dictates which body will go where on the court and which body will hit the ball.
  • in a multi-agent team
    • the bodies are distinct agents that do their own planning, they may still share identical goals (e.g. 2 human tennis players who form a doubles team share the goal of winning the match)
    • each agent decides what to do; without some method for coordination (e.g. both agents may decide to cover the same part of the court and each may leave the ball for the other to hit)

The clearest case of a multiagent problem, is when the agents have different goals. In tennis, the goals of two opposing teams are in direct conflict, leading to the zero-sum game of Chapter 5

Finally, some systems are a mixture of centralized and multiagent planning. For example, a delivery company may do centralized, offline planning for the routes of its trucks and planes each day, but leave some aspects open for autonomous decisions by drivers and pilots who can respond individually to traffic and weather situations. Also, the goals of the company and its employees are brought into alignment, to some extent, by the payment of incentives (salaries and bonuses)—a sure sign that this is a true multiagent system

obstacles in multi-agent planning

  • issues of representing and planning for multiple simultaneous actions
  • issues of cooperation, coordination, and competition
Planning With Multiple Simultaneous Actions

for the time being, let the term multi-actor settings cover: multi-effector, multi-body, and multi-agent settings

The goal of this section is to define transition models, correct plans, and efficient planning algorithms for the multi-actor setting

TODO page 445 of 3rd Edition PDF

Planning With Multiple Agents: Cooperation and Coordination