Hierarchical Task Network Plan Reuse for Video Games

Hierarchical Task Network Plan Reuse

for Video Games

Dennis J. N. J. Soemers and Mark H. M. Winands

Department of Data Science and Knowledge Engineering, Maastricht University

[email protected], m.winands@maastrichtuniversity.nl

Abstract—Hierarchical Task Network Planning is an Auto-

mated Planning technique. It is, among other domains, used in

Artiﬁcial Intelligence for video games. Generated plans cannot

always be fully executed, for example due to nondeterminism

or imperfect information. In such cases, it is often desirable to

re-plan. This is typically done completely from scratch, or done

using techniques that require conditions and effects of tasks to

be deﬁned in a speciﬁc format (typically based on First-Order

Logic). In this paper, an approach for Plan Reuse is proposed

that manipulates the order in which the search tree is traversed

by using a similarity function. It is tested in the SimpleFPS

domain, which simulates a First-Person Shooter game, and shown

to be capable of ﬁnding (optimal) plans with a decreased amount

of search effort on average when re-planning for variations of

previously solved problems.

I. INTRODUCTION

The problem of deciding what tasks should be executed

by an agent in a real-time video game can be addressed in

a number of different ways. Commonly used techniques [1],

[2] are Finite State Machines, Behavior Trees, Utility-based

decision making systems, and Automated Planning. One of

the ﬁrst well-known applications of Automated Planning in

video games is in F.E.A.R. [3].

Hierarchical Task Network (HTN) Planning [4], [5] is an

automated planning technique that has seen some use in video

games. HTN Planning has been used to control a team of

bots in the game Unreal Tournament 2004 [6], to generate

scripts ofﬂine for the game The Elder Scrolls IV: Oblivion

[7], in serious gaming [8], [9], and in the adversarial real-

time strategy game µRTS [10]. Examples of commercial video

games that are known to use HTN Planners are Killzone 3 and

Transformers 3: Fall of Cybertron [11].

Generated plans cannot always be successfully executed,

because HTN Planners are not able to predict the actions of

other agents or deal with imperfect information and nondeter-

minism in the environment without incorporating extensions

[12], [13]. Existing systems using HTN Planning in video

games typically construct new plans from scratch whenever a

plan fails during execution [8]. Planning systems outside the

context of video games do the same in some cases [14], but

there is also work describing more sophisticated approaches

[15]–[20]. These approaches require conditions and effects

of tasks on the environment to be explicitly deﬁned in a

predeﬁned format (typically based on First-Order Logic).

This paper proposes an approach for reusing old plans of

an HTN Planner that does not require conditions and effects

to be explicitly deﬁned, but allows for them to be deﬁned

in functions that are essentially black boxes from the point

of view of the Planner. The main purpose of reusing plans

is to speed up the process of ﬁnding a new plan. Another

motivation for reusing plans is to increase the likelihood of

ﬁnding a plan that is similar to a partially executed previous

plan. In video games this can reduce the number of times that

an agent abruptly changes behavior, and therefore increase the

believability of the agent’s behavior. It is tested on problems

from the SimpleFPS [21] domain, using the Unreal Engine

4 (UE4) game engine as test environment. UE4 is the latest

version of a commercial and widely-used game engine. Ex-

periments have been carried out to measure the effect of Plan

Reuse on the search effort required to ﬁnd optimal plans. The

effect of Plan Reuse on the quality of plans returned when

terminating a search process early has also been measured.

The remainder of the paper is structured as follows. Sec-

tion II provides background information on HTN Planning. In

Section III, the implementation of the HTN Planning algorithm

is described. The approach used for reusing old plans is

described in Section IV. A description of the experiments can

be found in Section V. Finally, Section VI concludes the paper

and provides ideas for future research.

II. HIERARCHICAL TASK NETWORK PLANNING

A Hierarchical Task Network (HTN) Planning Problem [4],

[5], [22] can be deﬁned as a tuple P = (S, T, O, M), where

S is the current World State, T is the current Task Network,

O is the set of Operators, and M is the set of Methods. More

speciﬁcally, S is a description of the environment in which the

agent is located, and should contain all the information that

is relevant for the planning process. T is a collection of tasks

that need to be accomplished by the agent, where tasks can

be constrained to require accomplishment before certain other

tasks in the network. A task can be either primitive, meaning

that it directly corresponds to an action that the agent can

execute, or compound, meaning that it represents a higher-

level plan that needs to be decomposed into a Task Network.

S and T change during the planning process.

Every Operator o ∈ O represents the execution of a single

primitive task t

. Given S, o deﬁnes conditions that must hold

in S for t

to be applicable in S, deﬁnes how S changes

if t

is applied (executed by the agent) in S, and deﬁnes a

nonnegative cost for applying t

. Similarly, every Method m ∈

M represents the execution of a single compound task t

Fig. 1. Example Task Network. An arrow pointing from one task to another

task means that the ﬁrst task is constrained to require execution before the

second task (the ﬁrst task is a predecessor of the second task). Boxes with a

double line are compound tasks, boxes with a single line are primitive tasks.

Fig. 2. Example Method. “Take Out ?Enemy” is the compound task for

which this method is deﬁned, where “?Enemy” is a variable. The boxes with

dashed lines denote task networks. The two arrows pointing away from this

compound task point to task networks that the original compound task can be

decomposed into, under certain conditions. The left decomposition consists

of only a single compound task, whereas the right decomposition consists of

an ordered sequence of two primitive tasks.

Given S, m deﬁnes all the different Task Networks that t

can be decomposed into. O and M remain constant during

the planning process.

An HTN Planning system is expected to take a Problem P

as input, and produce a valid plan Π as output. A plan Π is

a valid plan for P if it is an ordered list of primitive tasks

that can be obtained by consecutively applying Methods and

Operators from M and O to the tasks in T in an order that

satisﬁes the constraints of T , until T is empty. Applying an

Operator o removes the corresponding primitive task t

from

T , appends it to Π, and changes S as deﬁned by o. Applying

a Method m replaces the corresponding compound task t

T with a subnetwork that is a valid decomposition according

to m in S (m can have more than one valid decomposition

in any given state S). A decomposition is a Task Network

that must be fully executed for the original compound task to

be considered executed, and can be viewed as a lower-level

description of the more abstract compound task.

An example of a Task Network is depicted in Figure 1. An

agent can execute this Task Network by ﬁrst taking out two

defenders, then picking up an enemy ﬂag, and then returning

home (to his own base). It does not matter in which order the

two enemy defenders are taken out, but they both need to be

taken out before the agent can pick up the enemy ﬂag. The

two tasks to take out defenders are compound tasks, meaning

that they cannot be executed directly but need to be reﬁned

further. An example method to do so is depicted in Figure 2.

This method deﬁnes that the compound task to take out an

enemy can be decomposed into a single compound task to

snipe the enemy under the condition that the agent has a Sniper

Riﬂe, or it can be decomposed into an ordered sequence of

two primitive tasks under the condition that the agent has a

weapon. If neither condition is satisﬁed, the compound task

cannot be decomposed and therefore cannot be executed.

In this description of the HTN Planning formalism, it has

only been speciﬁed what information needs to be deﬁned by

the various structures (such as Operators and Methods), and

not how this should be speciﬁed. Many existing planners, such

as SHOP2, deﬁne world states, conditions and effects in (a

subset of) First-Order Logic. The framework described in this

paper makes no assumptions about the form in which this

information is deﬁned, and allows for it to be implemented

directly in C++ functions and variables, as described in more

detail in Section III. It means that there are few restrictions

on what can be speciﬁed in an HTN Planning problem. It is

possible to deﬁne planning problems where the tasks are not

totally ordered, and variables are allowed. Such problems can

become undecidable [23].

III. HTN PLANNER

This section describes the implementation of the HTN

Planner plugin that has been developed for Unreal Engine 4.

A. Planning Problem Deﬁnition

Many HTN Planners, like SHOP2 [22], deﬁne the struc-

tures of an HTN Planning problem, such as Operators, using

formalisms based on logical expressions. The HTN Planner

described in this paper does not use such a logic-based

formalism, but instead uses a similar approach as SHPE [24].

Instead of deﬁning all the relevant information of a planning

problem using logical expressions, it is deﬁned directly using

C++ functions and variables. The main motivation for this

approach is that it does not require an inference engine, and

is expected to require less processing time [24].

For primitive and compound tasks, two base classes are

provided with a number of virtual functions that can be

implemented in subclasses to deﬁne domain-speciﬁc tasks.

A primitive task t

has the following functions that can

be overridden; ApplyTo(S), which should be implemented to

apply any effects of t

to a world state S; ExecuteTask(), which

should be implemented to execute t

during real gameplay

(as opposed to during the planning process); GetCost(S),

which can be implemented to return the cost of applying t

in the world state S; and IsApplicable(S), which should be

implemented to return a boolean value indicating whether or

not t

is applicable in a world state S. A compound task t

only has a single function to override; FindDecompositions(S),

which should be implemented to return a list T , where every

T ∈ T is a Task Network that is a valid decomposition of

in the world state S. Note that, in this implementation, the

concepts of Operators and Methods as mentioned in Section II

are no longer used, and any information that these structures

contained is instead located directly in the corresponding

primitive and compound tasks.

Algorithm 1 The HTN Planning algorithm

1: function INITIALIZE(T

, S

)

2: Fringe ← [(T

, S

, ∅, 0)]

3: BestCost ← ∞

4: function FINDPLAN

5: while Fringe 6= ∅ and TIMEAVAILABLE() do

6: (T

, S

, Π

, C

) ← Fringe.NEXT()

7: if T

is empty then

8: BestPlan ← Π

9: BestCost ← C

10: continue

11: for all t ∈ T

without predecessors do

12: if t is primitive then

13: if t.ISAPPLICABLE(S

) then

14: T

← T

.REMOVE(t)

15: S

← t.APPLYTO(S

)

16: Π

← [Π

, t]

17: C

← C

+ t.GETCOST(S

)

18: Fringe.ADD((T

, S

, Π

, C

))

19: else

20: D ← t.FINDDECOMPOSITIONS(S

)

21: for all D ∈ D do

22: T

← T

.REPLACE(t, D)

23: Fringe.ADD((T

, S

, Π

, C

))

B. Finding a Plan

The HTN Planner is expected to take a Task Network T

and

an initial World State S

as input, and produce a valid plan Π

as output, as described in Section II. The algorithm that has

been implemented to do this is similar to the algorithm used

by SHPE [24] and SHOP2 [22]. It performs a search through

the space of all (partially) decomposed networks, starting from

. The intuition behind it is that T

is a highly abstract Task

Network, containing a relatively large number of compound

tasks, and it is gradually simpliﬁed by decomposing compound

tasks and moving primitive tasks from the network into the

plan. Primitive actions are inserted into the plan in the same

order in which they are intended to be executed after planning.

Pseudocode for this algorithm can be found in Algorithm 1.

The algorithm is initialized with a single tuple (T

, S

, ∅, 0)

in the Fringe. The Fringe is the collection of nodes in the

search tree that have not been processed yet. Every element

in this collection contains the Task Network of tasks that have

not yet been completed, the current world state, the (partial)

plan constructed so far, and the execution cost so far.

In each iteration, one node (T

, S

, Π

, C

) is removed from

the Fringe, and every task t ∈ T

that does not have any

predecessors is processed. A predecessor of t is a task that

is constrained by T

to require processing before t. All tasks

that are allowed to be executed directly according to T

are

processed, and all other tasks are not yet processed.

If t is a primitive task that is applicable in S

, t is applied

to S

, appended to Π

, and removed from T

. This results

in a single new tuple that is placed in the Fringe. If t is a

compound task, one new tuple is placed in the Fringe for

every valid decomposition D of t in S

. In this case, T

modiﬁed in every new tuple by replacing t in T

with D.

In the planner described in this paper, the Fringe has been

implemented as a stack. This means that the algorithm acts as

a depth-ﬁrst search. Many other planners, such as SHOP2 and

SHPE, are also implemented in this way.

C. Branch-and-bound and Heuristic Cost Estimation

A branch-and-bound optimization can speed up the search

algorithm described above when searching for an optimal

solution. Immediately after taking a tuple (T

, S

, Π

, C

) from

the Fringe in line 6 of Algorithm 1, the cost C

for executing

the partial solution Π

is compared to the cost of the best

solution found so far (BestCost). If at this stage C

≥ BestCost,

cannot lead to an improvement on the best solution found

so far, and the algorithm can immediately continue with the

next element of the Fringe. This is the same branch-and-bound

optimization as described in [22], [24].

An admissible heuristic function h(T

, S

) that estimates the

future cost of executing the remaining Task Network T

given

a current world state S

can be used to improve the branch-

and-bound optimization. Given such a function, the algorithm

can prune partial solutions where C

+ h(T

, S

) ≥ BestCost.

Such a heuristic function can incorporate domain-speciﬁc

knowledge, but if two extra restrictions are placed on the

problem deﬁnition it is also possible to deﬁne a domain-

independent heuristic function. The ﬁrst of these restrictions is

that the cost of executing a primitive task cannot depend on the

world state in which it is executed. The second restriction is

that compound tasks cannot be deﬁned in such a way that they

can result in an inﬁnitely long sequence of decompositions

(which is possible when using recursion in the deﬁnition

of compound tasks). Under these restrictions, the following

domain-independent heuristic function h(T

) is well-deﬁned:

h(T

) =

t∈T

h(t) (1)

h(t

) = Cost(t

) (2)

h(t

) = min

D∈D

h(D) (3)

In Equation 1, the world state S

has been omitted as an

argument because it cannot provide any information for a

domain-independent heuristic. The heuristic function h(t) for

a single task t used in Equation 1 is deﬁned by Equation 2

for the case where t is primitive, or Equation 3 for the case

where t is compound. In Equation 3, D denotes the set of all

possible decompositions of t

in any possible world state.

IV. PLAN REUSE

In this section, an approach for reusing old plans to more

efﬁciently ﬁnd new plans for similar problems is described.

This approach does not require effects and conditions of tasks

to be explicitly deﬁned in a predeﬁned format.

When an HTN Planner has previously found a plan for some

planning problem, and is later required to ﬁnd a plan for a new

planning problem that is similar to the previous problem, it

is expected to be possible to make use of the old solution

to speed up the planning process. Existing approaches for

reusing or repairing plans in an HTN Planner [15]–[20] require

effects and conditions of tasks to be deﬁned in a predeﬁned

format (typically using First-Order Logic). In most cases, this

is because they analyze dependencies between the conditions

and effects of tasks and store this information in graphs or

other structures. These approaches are not compatible with

the implementation of the planner as described in Section III,

where the conditions of tasks and the effects of tasks on the

world state are implemented in functions that are black boxes

from the point of view of the Planner. The following approach

does not have this problem.

Let P

old

= (S

old

, T

old

, O, M) be an old planning problem

for which an optimal plan Π

old

was generated using the HTN

Planning algorithm as described in the previous section. Let

new

= (S

new

, T

new

, O, M) be a new planning problem for

which the HTN Planner needs to ﬁnd a solution Π

new

. O and

M are equivalent for the two problems, so the same sets of

primitive and compound tasks are deﬁned. The assumption is

made that S

old

and S

new

are in some sense similar, and that

old

and T

new

are also in some sense similar. Finally, the

assumption is made that an optimal solution Π

new

will, due

to the previous assumptions, also be similar to Π

old

The intuition behind the approach is that it is likely to ﬁnd

higher quality solutions ﬁrst if branches of the search tree that

led to Π

old

are prioritized when traversing the new search tree

to look for Π

new

. Similar ideas have also previously been used

in game-tree search algorithms for abstract games [25], [26].

There are two beneﬁts to ﬁnding high quality solutions as soon

as possible. The ﬁrst beneﬁt is that, if in a real-time setting

such as a video game the planning process is terminated early,

a higher quality plan will be available. The second beneﬁt

is that the upper bound on the cost of the optimal plan is

lowered more quickly, and therefore the branch-and-bound

optimization can prune larger parts of the search space.

A. Search Tree Structure

Because the approach for plan reuse relies on manipulating

the order in which the planning algorithm traverses branches

of the search tree, it is useful to ﬁrst take a closer look at the

structure of this search tree.

An example of a search tree for a simple planning prob-

lem, with only a single compound task in the initial Task

Network, is depicted in Figure 3. A node N

in the search

tree encapsulates a tuple (T

, S

, Π

, C

) as found in Algo-

rithm 1. When N

is visited (returned by Fringe.Next() and

processed as seen in the pseudocode), a set of successor nodes

Successors(N

) can be generated. For example, in Figure 3,

Successors(A) = {B, E}. Successors(N

) is empty if N

gets pruned by the branch-and-bound optimization, or if N

is a leaf node. A leaf node is either a solution node if T

is empty, or a failed node if T

is non-empty and does not

contain any tasks without predecessors that can be executed

in S

. In the ﬁgure, D and G are solution nodes.

Fig. 3. Example search tree. Every circle represents a node in the search tree.

The boldface letters are used to refer to speciﬁc nodes in the text. The Task

Network of tasks that still need to be solved is shown under this identiﬁer for

every node. C

are compound tasks, and P

are primitive tasks. The text on

every branch describes the change that is applied on that transition between

two nodes. Nodes D and G have empty Task Networks.

If Successors(N

) is non-empty, N

has at least one succes-

sor node. If T

is totally ordered, meaning that there is exactly

one task t ∈ T

that does not have any predecessors, there will

be exactly one successor node if t is primitive, and there can

be more than one successor node if t is compound. Traversing

a branch in this situation can be viewed as committing to solve

t in a certain, concrete way. If T

is not totally ordered, there

is one set of branches for every task t ∈ T

that does not

have any predecessors, and within these sets it is again true

that there is exactly one branch if t is primitive, and there can

be more branches if t is compound. In this case, traversing

a branch corresponds to selecting a task t and then choosing

a concrete way to solve that task t. All Task Networks that

appear in Figure 3 are totally ordered.

B. Similarity-Based Branch Reordering

The proposed approach for Plan Reuse requires a similarity

function Sim(Π

old

, N

), which computes a measure of simi-

larity between Π

old

and a current node N

in the search tree.

The information available in N

is the tuple (T

, S

, Π

, C

The approach described in this paper only makes use of Π

which simpliﬁes the similarity function to Sim(Π

old

, Π

The example search tree in Figure 3 depicts how the Task

Network T changes in every node, and a description of the

processing that is done to generate successor nodes is placed

on every branch. When a compound task is processed to

generate a successor, only the contents of T change. This

means that, in Figure 3, nodes A, B, C and E all share the same

plan Π. When a primitive task is processed, the description on

the branch indicates how Π changes. This means that nodes

D, F and G all have different plans from the other nodes.

Suppose that, at some point in the planning process that

generated Π

old

, the compound task C

was solved using the

path in the right-hand side of the ﬁgure. Without any changes

to the deﬁnition of a plan Π, any function Sim(Π

old

, Π

)

returns the same results for the nodes A, B, C and E because

they share the same plan. This means that F is the ﬁrst node in

which it can be recognized that a path is being continued that

was a part of the optimal solution of Π

old

. The ﬁrst change

made to the planning algorithm is to redeﬁne the concept of

a plan Π to also append compound tasks to Π as they are

processed. With this change, it is already possible to recognize

in nodes B and E that they have some similarity with Π

old

(they all contain the compound task C

). In node C, it can

then be recognized that an “incorrect” path is traversed (C

has been added which does not occur in Π

old

), and in node

F it can be recognized that the “correct” path is continued,

leading to different levels of similarity.

The similarity function proposed in this paper is the function

that computes the longest Currently Matching Streak (CMS).

Intuitively, it is the function that ﬁnds the length of the longest

sequence of consecutive tasks in Π

old

that also occurs at the

end of the current (partial) plan Π. More formally, let Π[i]

denote the i

task in a plan Π. Let m denote the number

of tasks in a plan Π. The similarity measure CM S(Π

old

, Π)

is then deﬁned as the maximum possible value n such that,

for some index x, Π

old

[x] = Π

[m], and Π

old

[x − n + j] =

[m − n + j] for all j where 1 ≤ j < n. A score of 0 is

assigned if Π

[m] does not occur anywhere in Π

old

For example, let Π

old

= [A, B, C, D, E], Π

= [A, B, C]

and Π

= [A, B, C, X, D, E]. Then CM S(Π

old

, Π

) = 3,

because the entire sequence of tasks of Π

also occurs as

a consecutive sequence in Π

old

. Π

also contains the same

sequence, but in Π

the sequence is followed by a non-

matching task X, and then followed by another streak of

length 2 that occurs in Π

old

. Therefore, CM S(Π

old

, Π

) = 2.

This similarity measure “rewards” streaks of consecutive

tasks that also occurred in the same order in Π

old

, and

also instantly punishes appending a non-matching task to an

existing matching streak by resetting the score to 0. It is used

to sort nodes for processing as follows. A node N

is processed

before a node N

if CM S(Π

old

, Π

) > CM S(Π

old

, Π

). If

CM S(Π

old

, Π

) = CM S(Π

old

, Π

) = 0, the CMS scores of

the closest ancestor nodes with non-zero CM S scores are used

instead of the CMS scores of the nodes themselves. Finally,

ties are broken by using the same ordering as a regular DFS.

An example search tree is depicted in Figure 4. The numbers

in this ﬁgure indicate the order in which parts of the subtree

are processed.

C. Domain-Speciﬁc Ordering

The approach for reordering branches based on a similarity

measure as described above is expected to be better than an

arbitrary ordering of branches in cases where the new planning

problem is related to the old planning problem. In reality,

however, the branches typically are not ordered arbitrarily but

are already ordered more efﬁciently based on domain-speciﬁc

knowledge. For example, if a compound task is processed to

ﬁnd an item of a speciﬁc type somewhere in a map, and

Fig. 4. Example search tree with plan reuse. Circles represent individual

nodes, and triangles represent arbitrarily large subtrees. The numbers indicate

the order in which nodes or subtrees are processed. Thick lines represent

branches where a “correct” choice was made, meaning a task was added that

continued a current matching streak or started a new streak.

one valid decomposition is created for every item of that

type, these branches can be ordered according to the distance

between the agent and those items. Branches for items that

are already nearby are then explored ﬁrst.

When there is already a good ordering of branches based

on domain-speciﬁc knowledge, Plan Reuse can be detrimental,

especially if it is also uncertain if a new planning problem is

really similar to an old problem. Two approaches are proposed

to reduce the likelihood of Plan Reuse having detrimental

effects in the presence of domain-speciﬁc ordering, at the cost

of also reducing the potential gains of Plan Reuse.

The ﬁrst approach is the introduction of a parameter M

denoting the Minimum Streak Length required for branches to

be reordered according to their CM S score. Any CMS score

that is less than M is simply set to 0. This makes the plan reuse

less aggressive, which means there are fewer potential gains,

but it is also more likely that a matching streak of tasks can

actually be continued when it already has a sufﬁcient length.

The second approach is to make the search probabilistic.

This idea is inspired by a probabilistic approach for resuing

plans in classical planning [27]. A parameter p is introduced,

where 0 ≤ p ≤ 1, which deﬁnes the probability with which

the planner temporarily ignores parts of the search tree that are

prioritized according to Plan Reuse, and instead searches parts

that are not prioritized in a DFS manner. This idea has been

implemented as follows. Whenever the planning algorithm

processes a leaf node, the algorithm is set in a mode where

it ignores prioritized nodes with probability p, and it is set

in a mode where it does not ignore prioritized nodes with

probability 1 − p. Nodes are considered to be prioritized if

and only if they are on a path that contains some node with

CM S > 0. The reason for continuing to run in the same

mode until a leaf node is processed is to avoid switching

modes too often. Nodes that are pruned by the branch-and-

bound optimization are not considered to be leaf nodes. The

parameter value p = 0 means that the ordering of Plan Reuse

always overrides the ordering of domain-speciﬁc heuristics,

and p = 1 means that Plan Reuse is not used. With 0 < p < 1,

lower values for p are more suitable for cases where Plan

Reuse is expected to be more reliable than domain-speciﬁc

heuristics, and higher values are more suitable otherwise.

V. EXPERIMENTS

This section describes the setup and the results of the exper-

iments that have been carried out to evaluate the performance

of the approach for Plan Reuse.

A. Experimental Setup

SimpleFPS [21] is a planning domain that has been designed

to simulate planning problems in FPS games. Originally it

was deﬁned as a classic planning domain, but it has also

been translated into an HTN Planning domain and used for

the evaluation of the HTN Planner SHPE [24]. Even though

SimpleFPS is only a simulation of an FPS game, and not a

real game, the generated planning problems are not necessarily

less complex. With an average optimal plan length of 32 in the

experiments described below, the problems can be estimated to

be an order of magnitude more complex than those observed

in real games [28].

Problems of this planning domain have been randomly

generated and used to evaluate the performance of Plan Reuse

in comparison to the same planning algorithm without Plan

Reuse. The experiments have been carried out inside UE4.

This means that any overhead involved in implementing and

running a planner inside a game engine, as opposed to running

it in isolation, is included in the results. The results were

obtained using an Intel Core i5 CPU (2.67GHz), running on

Windows 7. During the planning processes in these experi-

ments, the memory usage of the entire plugin (including the

SimpleFPS map data and the constant memory usage of the

planner when idle) was at most in the order of 1 MB.

The original version of SimpleFPS is deterministic (after

random problem generation), and assumes that the agent has

access to perfect information. This means that these problems

do not require any re-planning. For these experiments, the

problems have been changed such that all doors in the maps

are assumed to be unlocked initially, and the agent only obtains

the information that a door is locked if the agent attempts

to move through it. This means that the planner typically

ﬁnds invalid solutions ﬁrst, and problems often require re-

planning when new information is obtained. To evaluate the

performance of Plan Reuse, these “re-planning episodes” have

been performed both with and without Plan Reuse. The

previous plan is used as Π

old

for Plan Reuse, but ﬁrst pre-

processed to remove all tasks that have already been executed.

Furthermore, the SimpleFPS domain was changed to punish

the agent with an extra cost (equivalent to 50 “normal” tasks)

for plans in which it chose for a different attacking approach

from the original plan, where the three possible attacking

The implementation of the planner used in these experiments, and

a more detailed description of the implementation, are available at

https://github.com/DennisSoemers/HTN Plan Reuse.

TABLE I

EFFECTS OF PLAN REUSE IN TOTAL

Parameter Values ∆Nodes Processed ∆Time

M = 10, p = 0 -7.04% -11.11%

M = 10, p = 0.25 -11.51% -18.60%

M = 20, p = 0 -1.70% +1.86%

M = 20, p = 0.25 -1.18% +3.53%

M = 30, p = 0 -2.81% -2.41%

M = 30, p = 0.25 -2.43% -0.98%

approaches are melee, ranged and stealth. This means that

if, for example, the old plan involved picking up a knife that

turns out to be behind a closed door, it is unlikely that a new

optimal plan will instead involve picking up a gun somewhere

else. With this change, the likelihood of parts of Π

old

still

being useful for a new optimal solution is increased. It is still

possible that there also is a second knife somewhere in a more

convenient location, so there also still are problems where Plan

Reuse can be detrimental.

Six different variants of Plan Reuse have been tested based

on the approach described in Section IV, with different values

for the parameters M and p. For M , the values 10, 20 and

30 have been tested. The optimal value for this parameter

is domain-speciﬁc though, and different values may be more

suitable for different problems. The value M = 1 has also

shortly been tested, but was found to be too aggressive, and

has not been included in the results. For p, the values 0 and

0.25 have been tested, where p = 0 means the use of Plan

Reuse is not probabilistic. The value of p = 0.25 was chosen

after a smaller number of tests, but is also close to 0.3, which

is one of the values used for a similar parameter in [27].

B. Results

A total of 209 problems were completely processed by all

variants of Plan Reuse, of which 10 problems were proven not

to have any solutions. There were 22 problems that were not

solved by any of the variants because they were terminated due

to taking too much time. Planning processes were terminated

early and declared a failure if no solution was found at all

within 75 seconds, or no optimal solution within 150 seconds.

Table I shows the total change in the number of nodes

processed and the amount of time spent planning of all the

planning problems added together. It shows that especially the

two most aggressive variants of Plan Reuse, with M = 10,

perform well. The variants with M = 20 have a weak

performance. This is largely caused by one speciﬁc problem,

which is one of the largest problems in the set, where Plan

Reuse with M = 20 turned out to be highly detrimental. The

mean change in the absolute number of nodes processed by

the variant with M = 10 and p = 0.25 is signiﬁcant according

to a paired, two-tailed Student’s t-test with a signiﬁcance level

of 0.05 (p-value ≈ 0.037).

Figure 5 shows the difference in plan quality that Plan Reuse

makes as a function of the search effort. In this ﬁgure, all

planning problems have been mapped to a single measure of

search effort and a single measure of plan quality. Informally,

if a plot has a point y at x = 0.5, that variant of Plan Reuse

Fig. 5. The change in relative solution quality obtained by adding Plan Reuse

as a function of relative search effort.

changes the plan quality by y% on average if a planning

process is interrupted after half the search effort that planning

without Plan Reuse would require to ﬁnd the optimal solution.

A point x on the x-axis denotes that x × n number of nodes

have been processed, where n is the number of nodes that

were required to ﬁnd (but not necessarily prove) the optimal

solution when planning without Plan Reuse. A point y on the

y-axis denotes the difference in the average quality of the best

solution found so far between planning with and without Plan

Reuse. The quality of a solution is deﬁned as

∗

× 100%,

where C is the cost of that solution and C

∗

is the cost of an

optimal solution for that problem.

All the plots show a decrease close to x = 1. This is simply

because all variants of Plan Reuse have at least some problems

where Plan Reuse is detrimental, and x = 1 denotes exactly

the amount of search effort that planning without Plan Reuse

requires for all problems. So, x = 1 denotes the point in time

where it is no longer possible to do any better than planning

without Plan Reuse, and it is only possible to do worse. The

ﬁgure shows that all the variants of Plan Reuse are above the

x-axis (indicating an improvement in average plan quality of

up to 8%) until x approaches 1. The variants with lower values

for M show larger improvements on average. All variants peak

with low amounts of search effort, indicating that Plan Reuse is

especially beneﬁcial if solutions are required in a short amount

of time (for instance, in real-time). For the variants with M <

30, the changes in quality are signiﬁcant (p-value < 0.05).

There appears to be less variance in the changes in quality for

the variants with p = 0.25 (p-value < 0.01).

C. Unchanged Problems Removed

The results for the problems as described above include all

the problems on which none of the variants of Plan Reuse

made any difference at all in the number of nodes processed

compared to re-planning without Plan Reuse. On some of these

problems, Plan Reuse cannot have any effect because the value

for M is too conservative. This is a side effect of avoiding

detrimental cases, and therefore these problems have not been

excluded from the results above.

TABLE II

EFFECTS OF PLAN REUSE IN TOTAL - NO UNCHANGED PROBLEMS

Parameter Values ∆Nodes Processed ∆Time

M = 10, p = 0 -11.57% -18.26%

M = 10, p = 0.25 -18.92% -29.89%

M = 20, p = 0 -2.79% +1.68%

M = 20, p = 0.25 -1.93% +4.26%

M = 30, p = 0 -4.62% -4.81%

M = 30, p = 0.25 -3.99% -2.75%

Fig. 6. The change in relative solution quality obtained by adding Plan Reuse

as a function of relative search effort. (No unchanged problems)

However, these unchanged problems can also include prob-

lems where M was not set too high, but Plan Reuse simply did

not result in any signiﬁcant changes in the ordering, such as

trivial problems where the domain-speciﬁc ordering is (nearly)

optimal. For planning domains where such planning problems

are not expected to occur, it can be interesting to look at

the results obtained by removing the problems that remained

unchanged by all variants of Plan Reuse from the sets.

The results with these unchanged problems (117 problems)

removed can be found in Table II and Figure 6. On this set

of problems, Plan Reuse reduces the total number of nodes

processed by up to 18.92% and the processing time by up to

29.89% (both for M = 10, p = 0.25). The peak increase in

average plan quality is up to 20% at a relative search effort

of 20% (M = 20, p = 0.25). The p-values of these results are

nearly identical to those of the corresponding results including

unchanged problems.

VI. CONCLUSION AND FUTURE WORK

In this paper, an approach has been proposed for reusing

previously found plans in an HTN Planner when presented

with new planning problems that are similar to one that was

previously solved. Unlike existing approaches, it does not

require conditions and effects of the domain to be speciﬁed in

a pre-determined form, but allows for them to be implemented

in black box functions. The main idea behind the approach is

to manipulate the order in which the search tree is traversed

by using a similarity function for plans.

Plan Reuse has been shown to be capable of reducing the

average number of nodes required to ﬁnd optimal solutions

for SimpleFPS planning problems [21] that are likely to

have similar solutions to previously solved problems, and

also reduce the computation time. It has also been shown to

improve the average quality of plans when using the planning

algorithm as an anytime algorithm.

For future work, it would be interesting to investigate

whether it is possible to estimate whether Plan Reuse will be

likely to be beneﬁcial or detrimental for a speciﬁc planning

problem before the planning process is started. A direction of

future research is to do this automatically by, for instance,

computing a similarity measure between the old and the

new planning problem. If this likelihood can be estimated

accurately, Plan Reuse can be turned off in problems where

it is expected not to be beneﬁcial, and it can be turned on in

problems where it is expected to be beneﬁcial.

Furthermore, it could be interesting to investigate if Plan

Reuse ﬁnds plans that are more similar to the old plan in

cases where there are multiple plans that are all optimal

with respect to the cost function. This has been mentioned

as a motivation for Plan Reuse in the paper, because it can

reduce the likelihood of an agent abruptly changing behavior

in a video game and therefore increase the believability of

the behavior. It has not been investigated in this paper’s

experiments because the SimpleFPS problems were found to

typically have a low number of different optimal solutions. The

planner and approach for Plan Reuse have also brieﬂy been

tested in an alpha version of the game of Unreal Tournament,

which uses Unreal Engine 4. These tests are not described in

more detail because the planning problems were too simple

for Plan Reuse to make a noticeable difference. Experiments

with more complex planning problems in later versions of the

game, or in other games, could be done in future research.

Finally, a direction for future work would be to look into

different variants of similarity measures. For instance, the

CM S measure could be changed to have a lower value when

appending a non-matching task to an existing streak, instead

of resetting the score entirely to 0.

REFERENCES

[1] I. Millington and J. Funge, Artiﬁcial Intelligence for Games, 2nd ed.

Morgan Kaufmann, 2009.

[2] D. Isla, “Managing Complexity in the Halo 2 AI System,” in Proceedings

of the Game Developers Conference, 2005.

[3] J. Orkin, “Three States and a Plan: The A.I. of F.E.A.R,” in Game

Developers Conference, 2006.

[4] E. D. Sacerdoti, “The Nonlinear Nature of Plans,” in Proc. 4th Int. Joint

Conf. Artif. Intell., vol. 1. Stanford, CA: Morgan Kaufmann Publishers

Inc., 1975, pp. 206–214.

[5] K. Erol, J. Hendler, and D. S. Nau, “UMCP: A Sound and Complete

Procedure for Hierarchical Task-Network Planning,” in Proc. 2nd Int.

Conf. on Artif. Intell. Planning Syst., K. Hammond, Ed. Menlo Park,

CA: AAAI Press, 1994, pp. 249–254.

[6] H. Hoang, S. Lee-Urban, and H. Mu

noz-Avila, “Hierarchical Plan

Representations for Encoding Strategic Game AI,” in Proc. AIIDE.

AAAI Press, 2005, pp. 63–68.

[7] J. P. Kelly, A. Botea, and S. Koenig, “Ofﬂine Planning with Hierarchical

Task Networks in Video Games,” in Proc. 4th AIIDE Conf., C. Darken

and M. Mateas, Eds. Menlo Park, CA: AAAI Press, 2008, pp. 60–65.

[8] A. Menif, C. Guettier, and T. Cazenave, “Planning and Execution

Control Architecture for Infantry Serious Gaming,” in Proc. of the

Planning in Games Workshop of ICAPS 2013, M. Buro,

E. Jacopin,

and S. Vassos, Eds., 2013, pp. 31–34.

[9] I. M. Mahmoud, L. Li, D. Wloka, and M. Z. Ali, “Believable NPCs in

Serious Games: HTN Planning Approach Based on Visual Perception,”

in Proc. IEEE Conf. Comput. Intell. Games, 2014, pp. 248–255.

[10] S. Onta

on and M. Buro, “Adversarial Hierarchical-Task Network

Planning for Complex Real-Time Games,” in Proc. 24th Int. Joint Conf.

Artif. Intell., Q. Yang and M. Wooldridge, Eds. AAAI Press, 2015, pp.

1652–1658.

[11] T. Humphreys, “Exploring HTN Planners through Examples,” in Game

AI Pro: Collected Wisdom of Game AI Professionals, 1st ed., S. Rabin,

Ed. CRC Press, 2013, ch. 12, pp. 149–167.

[12] U. Kuter and D. Nau, “Forward-Chaining Planning in Nondeterministic

Domains,” in Proc. 19th Nat. Conf. Artif. Intell., A. G. Cohn, Ed. Menlo

Park, CA: AAAI Press, 2004, pp. 513–518.

[13] U. Kuter, D. Nau, M. Pistore, and P. Traverso, “Task Decomposition

on Abstract States, for Planning under Nondeterminism,” Artiﬁcial

Intelligence, vol. 173, no. 5-3, pp. 669–695, 2009.

[14] S. Yoon, A. Fern, and R. Givan, “FF-Replan: A Baseline for Probabilistic

Planning,” in Proc. 17th ICAPS, M. Boddy, M. Fox, and S. Thi

ebaux,

Eds. Menlo Park, CA: AAAI Press, 2007, pp. 352–359.

[15] S. Kambhampati and J. A. Hendler, “A Validation-Structure-Based

Theory of Plan Modiﬁcation and Reuse,” Artiﬁcial Intelligence, vol. 55,

no. 2-3, pp. 193–258, 1992.

[16] B. Drabble, J. Dalton, and A. Tate, “Repairing Plans On-the-ﬂy,” in

Proc. NASA Workshop on Planning and Scheduling for Space, 1997.

[17] R. P. J. van der Krogt and M. M. de Weerdt, “Plan Repair as an Extension

of Planning,” in Proc. 15th ICAPS, S. Biundo, K. Myers, and K. Rajan,

Eds. Menlo Park, CA: AAAI Press, 2005, pp. 161–170.

[18] N. F. Ayan, U. Kuter, F. Yaman, and R. P. Goldman, “HOTRiDE:

Hierarchical Ordered Task Replanning in Dynamic Environments,” in

Planning and Plan Execution for Real-World Systems–Principles and

Practices for Planning in Execution: Papers from the ICAPS Workshop,

F. Ingrand and K. Rajan, Eds., 2007.

[19] I. Warﬁeld, C. Hogg, S. Lee-Urban, and H. Mu

noz-Avila, “Adaptation

of Hierarchical Task Network Plans,” in FLAIRS Conference, 2007, pp.

429–434.

[20] J. Bidot, B. Schattenberg, and S. Biundo, “Plan Repair in Hybrid

Planning,” in KI 2008: Advances in Artiﬁcial Intelligence, ser. LNCS,

A. R. Dengel, K. Berns, T. M. Breuel, F. Bomarius, and T. R. Roth-

Berghofer, Eds. Springer, 2008, vol. 5243, pp. 169–176.

[21] S. Vassos and M. Papakonstantinou, “The SimpleFPS Planning Domain:

A PDDL Benchmark for Proactive NPCs,” in AIIDE Workshop: Intel-

ligent Narrative Technologies, E. Tomai, D. Elson, and J. Rowe, Eds.

AAAI Press, 2011, pp. 92–97.

[22] D. Nau, T. Au, O. Ilghami, U. Kuter, J. W. Murdock, D. Wu, and

F. Yaman, “SHOP2: An HTN Planning System,” in JAIR, M. E. Pollack,

Ed. AAAI Press, 2003, vol. 20, pp. 379–404.

[23] K. Erol, J. Hendler, and D. S. Nau, “HTN Planning: Complexity and

Expressivity,” in Proc. 12th Nat. Conf. on Artif. Intell. Menlo Park,

CA: AAAI Press, 1994, pp. 1123–1128.

[24] A. Menif,

E. Jacopin, and T. Cazenave, “SHPE: HTN Planning for Video

Games,” in Computer Games, ser. Communications in Computer and

Information Science, T. Cazenave, M. H. M. Winands, and Y. Bj

ornsson,

Eds. Springer, 2014, vol. 504, pp. 119–132.

[25] Y. Kawano, “Using Similar Positions to Search Game Trees,” in Games

of No Chance, ser. MSRI Book Series, R. Nowakowski, Ed. Cambridge:

Cambridge University Press, 1996, vol. 29, pp. 193–202.

[26] M. Sakuta, T. Hashimoto, J. Nagashima, J. W. H. M. Uiterwijk, and

H. Iida, “Application of the Killer-Tree Heuristic and the Lambda-Search

Method to Lines of Action,” Information Sciences, vol. 154, no. 3, pp.

141–155, 2003.

[27] D. Borrajo and M. Veloso, “Probabilistically Reusing Plans in Determin-

istic Planning,” in Proc. ICAPS’12 Workshop on Heuristics and Search

for Domain-Independent Planning, P. Haslum, M. Helmert, E. Karpas,

C. L. L

opez, G. R

oger, J. Thayer, and R. Zhou, Eds. AAAI Press,

2012, pp. 17–25.

[28]

E. Jacopin, “Game AI Planning Analytics: The Case of Three First-

Person Shooters,” in Proc. 10th AIIDE Conf. AAAI Press, 2014, pp.

119–124.