Expressing and Exploiting Subgoal Structure in Classical Planning Using Sketches

Width-based planning methods deal with conjunctive goals by decomposing problems into subproblems of low width. Algorithms like SIW thus fail when the goal is not easily serializable in this way or when some of the subproblems have a high width. In this work, we address these limitations by using a simple but powerful language for expressing ﬁner problem decompositions introduced recently by Bonet and Geffner, called policy sketches . A policy sketch R over a set of Boolean and numerical features is a set of sketch rules C (cid:55)→ E that express how the values of these features are supposed to change. Like general policies, policy sketches are domain general, but unlike policies, the changes captured by sketch rules do not need to be achieved in a single step. We show that many planning domains that cannot be solved by SIW are provably solvable in low polynomial time with the SIW R algorithm, the version of SIW that employs user-provided policy sketches. Policy sketches are thus shown to be a powerful language for expressing domain-speciﬁc knowledge in a simple and compact way and a convenient alternative to languages such as HTNs or temporal logics. Furthermore, they make it easy to express general problem decompositions and prove key properties of them like their width and complexity.


Introduction
The success of width-based methods in classical planning is the result of two main ideas: the use of conjunctive goals for decomposing a problem into subproblems, and the observation that the width of the subproblems is often bounded and small (Lipovetzky & Geffner, 2012).When these assumptions do not hold, pure width-based methods struggle and need to be extended with heuristic estimators or landmark counters that yield finer problem decompositions (Lipovetzky & Geffner, 2017a, 2017b).These hybrid approaches have resulted in state-of-the-art planners but run into shortcomings of their own: unlike pure width-based search methods, they require declarative, PDDL-like action models and thus cannot plan with black box simulators (Lipovetzky, Ramirez, & Geffner, 2015;Shleyfman, Tuisov, & Domshlak, 2016;Geffner & Geffner, 2015), and they produce decompositions that are ad-hoc and difficult to understand.Variations of these approaches, where the use of declarative action models is replaced by polynomial searches, have pushed the scope of purewidth based search methods (Francès, Ramírez, Lipovetzky, & Geffner, 2017), but they do not fully overcome their basic limits: top goals that are not easily serializable or that have a high width.These are indeed the limitations of one of the simplest width-based search methods, Serialized Iterated Width (SIW) that greedily achieves top goal first, one at a time, using IW searches (Lipovetzky & Geffner, 2012).
In this work, we address the limitations of the SIW algorithm differently by using a simple but powerful language for expressing richer problem decompositions recently introduced by Bonet and Geffner (2021), called policy sketches.A policy sketch is a set of sketch rules over a set of Boolean and numerical features of the form C → E that express how the values of the features are supposed to change.Like general policies (Bonet & Geffner, 2018), sketches are general and not tailored to specific instances of a domain, but unlike policies, the feature changes expressed by sketch rules represent subgoals that do not need to be achieved in a single step.
We pick up here where Bonet and Geffner left off and show that many benchmark planning domains that SIW cannot solve are provably solvable in low polynomial time with the SIW R algorithm, the version of SIW that makes use of user-provided policy sketches.Policy sketches are thus shown to be a powerful language for expressing domain-specific knowledge in a simple and compact way and a convenient alternative to languages such as HTNs or temporal logics.Bonet and Geffner introduce the language of sketches and the theory behind them; we show their use and the properties that follow from them.As we will see, unlike HTNs and temporal logics, sketches can be used to express and exploit the common subgoal structure of planning domains without having to express how subgoals are to be achieved.Also, by being simple and succinct, they provide a convenient target language for learning the subgoal structure of domains automatically, although this problem, related to the problem of learning general policies (Bonet, Francès, & Geffner, 2019;Francès, Bonet, & Geffner, 2021), is outside the scope of this article.In this work, we use sketches to solve domains in polynomial time, which excludes intractable domains.Indeed, intractable domains do not have general policies nor sketches of bounded width and require non-polynomial searches.Sketches and general policies, however, are closely related: sketches provide the skeleton of a general policy, or a general policy with "holes" that are filled by searches that can be shown to be polynomial (Bonet & Geffner, 2021).
This article is an extended version of a conference paper published at the International Conference on Knowledge Representation and Reasoning (Drexler, Seipp, & Geffner, 2021) with the following main extensions: first, we prove properties of the policy sketches for all seven benchmark domains that we consider (the conference paper only contains two proofs).We also correct the previously reported upper bounds on the sketch width for the Grid and TPP domains.Second, we provide detailed information about the description logic grammar and the state features that we use in the policy sketches.Third, in our experiments, we evaluate the new SIW R algorithm in both a grounded and a lifted planner and compare both implementations to baseline planners on an additional set of harder benchmarks.Fourth, we release the code for constructing and evaluating description logic features in the form of an open-source C++ library (with Python bindings), called DLPlan.Finally, we expand the related work section to cover general policies, reward machines, landmarks and polynomial algorithms for domain-independent planning.
Since the publication of the original conference paper, there have been several works by us and others building on the ideas presented here.In particular, we have shown that it is possible to learn sketches automatically (Drexler, Seipp, & Geffner, 2022) and that we can use them as building blocks for learning hierarchical policies (Drexler, Seipp, & Geffner, 2023b).Daggelinckx (2023) used temporal logic to exhaustively generate all well-formed sketches up to a given size and Grundke (2022) compared policy sketches to other forms of domain-specific knowledge.Finally, Dalmau-Moreno, García, Gómez, and Geffner (2023) used policy sketches for combined task and motion planning.To keep the size of this article manageable, we focus exclusively on handcrafted sketches and their analysis here, but discuss related work in Section 7.
The article is organized as follows.We review the notions of width (Section 2), policy sketches (Section 3), and sketch width (Section 4), following Bonet and Geffner (2021).In Section 5, we show that it is possible to write compact and transparent policy sketches for many domains and establish their widths.We analyze the performance of the SIW R algorithm using these sketches in Section 6.Finally, we compare sketches to HTNs and temporal logics and briefly discuss the challenge of learning sketches automatically (Section 7), before summarizing the main contributions in Section 8.

Planning and Width
A classical planning problem or instance P = (D, I) is assumed to be given by a first-order domain D with action schemas defined over some domain predicates, and instance information I describing a set of objects, and two sets of ground literals describing the initial situation Init and goal description Goal .The initial situation is assumed to be complete such that either L or its complement is in Init.A problem P defines a state model S(P ) = (S, s 0 , G, Act, A, f ) where the states in S are the truth valuations over the ground atoms represented by the set of literals that they make true, the initial state s 0 is Init, the set of goal states G includes all of those that make the goal atoms in Goal true, the actions Act are the ground actions obtained from the action schemas and the objects, the actions A(s) applicable in state s are those whose preconditions are true in s, and the state transition function f maps a state s and an action a ∈ A(s) into the successor state s = f (a, s).A plan π for P is a sequence of actions a 0 , . . ., a n that is executable in s 0 and maps the initial state s 0 into a goal state; i.e., a i ∈ A(s i ), s i+1 = f (a i , s i ), and s n+1 ∈ G.The cost of a plan is assumed to be given by its length, and a plan is optimal if there is no shorter plan.We are interested in solving collections of well-formed instances P = (D, I) over fixed domains D denoted as Q D or simply as Q.
The most basic width-based search method for solving a planning problem P is IW(1).It performs a standard breadth-first search in the rooted directed graph associated with the state model S(P ) with one modification: IW(1) prunes a newly generated state if it does not make an atom r true for the first time in the search.The procedure IW(k) for k > 1 is like IW(1) but prunes a state if a newly generated state does not make a collection of up to k atoms true for the first time.Underlying the IW algorithms is the notion of problem width (Lipovetzky & Geffner, 2012): Definition 1 (Width).Let P be a classical planning problem with initial state s 0 and goal states G.The width w(P ) of P is the minimum k for which there exists a sequence t 0 , t 1 , . . ., t m of atom tuples t i , each consisting of at most k atoms, such that: 1. t 0 is true in initial state s 0 of P , 2. all optimal plans for t i can be extended into an optimal plan for t i+1 by adding a single action, i = 1, . . ., m − 1, 3. if π is an optimal plan for t m , then π is an optimal plan for P .
If a problem P is unsolvable, w(P ) is set to the number of variables in P , and if P is solvable in at most one step, w(P ) is set to 0 (Bonet & Geffner, 2021).Chains of tuples θ = (t 0 , t 1 , . . ., t m ) that comply with conditions 1-3 are called admissible, and the size of θ is the size |t i | of the largest tuple in the chain.We talk about the third condition by saying that t m implies G in the admissible chain t 1 , t 2 , . . ., t m .The width w(P ) is thus k if k is the minimum size of an admissible chain for P .If the width of a problem P is w(P ) = k, IW(k) finds an optimal (shortest) plan for P in time and space that are exponential in k and not in the number of problem variables N as breadth-first search.
The IW(k) algorithm expands up to N k nodes, generates up to bN k nodes, and runs in time and space O(bN 2k−1 ) and O(bN k ), respectively, where N is the number of atoms and b bounds the branching factor in problem P .IW(k) is guaranteed to solve P optimally (shortest path) if w(P ) ≤ k.If the width of P is not known, the IW algorithm can be run instead which calls IW(k) iteratively for k = 0, 1, . . ., N until the problem is solved, or found to be unsolvable.
While IW and IW(k) algorithms are not practical by themselves, they are building blocks for other methods.Serialized Iterated Width or SIW (Lipovetzky & Geffner, 2012), starts at the initial state s = s 0 of P , and then performs an IW search from s to find a shortest path to state s such that #g(s ) < #g(s) where #g counts the number of top goals of P that are not true in s.If s is not a goal state, s is set to s and the loop repeats until a goal state is reached.
In practice, the IW(k) searches in SIW are limited to k ≤ 2 or k ≤ 3, so that SIW solves a problem or fails in low polynomial time.SIW performs well in many benchmark domains but fails in problems where the width of some top goal is not small, or the top goals cannot be serialized greedily.More recent methods address these limitations by using width-based notions (novelty measures) in complete best-first search algorithms (Lipovetzky & Geffner, 2017a;Francès et al., 2017), yet they also struggle in problems where some top goals have high width.In this work, we take a different route: we keep the greedy polynomial searches underlying SIW but consider a richer class of problem decompositions expressed through sketches.The resulting planner SIW R is not domain-independent like SIW, but it illustrates that a bit of domain knowledge can go a long way in the effective solution of arbitrary domain instances.

Features and Sketches
A feature is a function of the state over a class of problems Q.The features considered in the language of sketches are Boolean, taking values in the Boolean domain, or numerical, taking values in the non-negative integers.For a set Φ of features and a state s of some instance P in Q, f (s) is the feature valuation determined by a state s.A Boolean feature valuation over Φ refers instead to the valuation of the expressions p and n = 0 for Boolean and numerical features p and n in Φ.If f is a feature valuation, b(f ) will denote the Boolean feature valuation determined by f where the values of numerical features are just compared with 0.
The set of features Φ distinguish or separate the goals in Q if there is a set B Q of Boolean feature valuations such that s is a goal state of an instance the set of all blocks world instances with stack/unstack operators and common goal clear(x) ∧ handempty for some block x, and Φ = {n(x), H} are the features that track the number of blocks above x and whether the gripper is holding a block, then there is a single Boolean goal valuation that makes the expression n(x) = 0 true and H false.
A sketch rule over features Φ has the form C → E where C consists of Boolean feature conditions, and E consists of feature effects.A Boolean (feature) condition is of the form p or ¬p for a Boolean feature p in Φ, or n = 0 or n > 0 for a numerical feature n in Φ.A feature effect is an expression of the form p, ¬p, or p? for a Boolean feature p in Φ, and n↓, n↑, or n? for a numerical feature n in Φ.The syntax of sketch rules is the syntax of the policy rules used to define generalized policies (Bonet & Geffner, 2018), but their semantics is different.In policy rules, the effects have to be delivered in one step by state transitions, while in sketch rules, they can be delivered by longer state sequences.
A policy sketch R Φ is a collection of sketch rules over the features Φ and the sketch is wellformed if it is built from features that distinguish the goals in Q, and is terminating (to be made precise below).A well-formed sketch for a class of problems Q defines a serialization over Q; namely, a "preference" ordering '≺' over the feature valuations that is irreflexive and transitive, and which is given by the smallest strict partial order that satisfies f ≺ f if f is not a goal valuation and the pair of feature valuations (f, f ) satisfies a sketch rule C → E. This happens when: 1) C is true in f , 2) the Boolean effects p (¬p) in E are true in f , 3) the numerical effects are satisfied by the pair (f, f ); i.e., if n↓ in E (resp.n↑), then the value of n in f is smaller than in f , i.e., f n < f n (resp.f n > f n ), and 4) Features that do not occur in E have the same value in f and f .Effects p? and n? do not constrain the value of the features p and n in any way, and by including them in E, we say that they can change in any way, as opposed to features that are not mentioned in E whose values in f and f must be the same.Following Bonet and Geffner, we do not use the serializations determined by sketches but their associated problem decompositions.
The set of subgoal states G r (s) associated with a sketch rule r : C → E in R Φ and a state s for a problem P ∈ Q, is the set of states s that are either goal states of P or those with feature valuations f (s ) such that (f (s), f (s )) satisfies the sketch rule r.Intuitively, when in a state s, the subgoal states s in G r (s) provide a stepping stone in the search for plans connecting s to the goal of P .Furthermore G * r (s) denotes the set of subgoal states in G r (s) that are closest to s, G R (s) denotes the set of subgoal states associated with a sketch R, i.e., G R (s) = r∈R G r (s), and G * R (s) denotes the set of subgoal states in G R (s) that are closest to s.

Serialized Iterated Width with Sketches
The SIW R algorithm is a variant of the SIW algorithm that uses a given sketch R = R Φ for solving problems P ∈ Q. SIW R starts at the state s := s 0 , where s 0 is the initial state of P , and then performs an IW search to find a state s that is closest from s such that s is a goal state of P or a subgoal state in G r (s) for some sketch rule r in R. If s is not a goal state, then s is set to s , s := s , and the loop repeats until a goal state is reached.The features define subgoal states through the sketch rules but otherwise play no role in the IW searches.
The only difference between SIW and SIW R is that in SIW each IW search finishes when the goal counter #g is decremented, while in SIW R , when a goal or subgoal state is reached.The behavior of plain SIW can be emulated in SIW R using the single sketch rule {#g > 0} → {#g↓} in R when the goal counter #g is the only feature, and the rule {#g > 0} → {#g↓, p?, n?}, when p and n are the other features.This last rule says that it is always "good" to decrease the goal counter independently of the effects on other features, or alternatively, that decreasing the goal counter is a subgoal from any state s where #g(s) is positive.
The complexity of SIW R over a class of problems Q can be bounded in terms of the width of the sketch R Φ , which is given by the width of the possible subproblems that can be encountered during the execution of SIW R when solving a problem P in Q.For this, let us define the set S R (P ) of reachable states in P when following the sketch R = R Φ recursively as follows: 1) the initial state s of P is in S R (P ), 2) the subgoal states s ∈ G * R (s) are in S R (P ) if s ∈ S R (P ).The states in S R (P ) are called the R-reachable states in P .The width of sketch R is then defined as follows (Bonet & Geffner, 2021).
Definition 2 (Sketch width).Let R = R Φ be a well-formed sketch for a class of problems Q.The width of the sketch R at state s of problem P ∈ Q, denoted w R (P [s, G * R (s)]), is the width k of the subproblem P [s, G * R (s)] that is like P but with initial state s and goal states ) for P ∈ Q and s ∈ S R (P ).1 The time complexity of SIW R can then be expressed as follows, under the assumption that the features are all linear (Bonet & Geffner, 2021).
A feature is linear if it can be computed in linear time and can take a linear number of values at most.In both cases, the linearity is in the number of atoms N in the problem P in Q.If the features are not linear but polynomial in P , the bounds on SIW R remain polynomial as well (both k and Φ are constants).
Bonet and Geffner introduce and study the language of sketches as a variation of the language of general policies and their relation to the width and serialized width of planning domains.They illustrate the use of sketches in a simple example (Delivery) but focus mainly on the theoretical aspects.Here we focus instead on their use for modeling domain-specific knowledge in the standard planning benchmarks as an alternative to languages like HTNs.

Sketches for Classical Planning Domains
In this section, we present policy sketches for seven classical planning domains from the International Planning Competition (IPC).All of the chosen domains are solvable suboptimally in polynomial time, but plain SIW fails to solve them.There are two main reasons why SIW fails.First, SIW fails if achieving a single goal atom has too large width, and second, SIW fails if greedy goal serialization generates such avoidable subproblems, including reaching unsolvable states.
We provide a handcrafted sketch for each of the domains and show that it is well-formed and has small sketch width.These sketches allow SIW R to solve all instances of the domain in low polynomial time and space by Theorem 3. Furthermore, we impose a low polynomial complexity bound on the runtime of the evaluation of each feature, i.e., at most cubic in the number of grounded atoms.Limiting feature complexity is required since without any limit, one could use a numerical feature that encodes the optimal value function V * (s), i.e., the perfect goal distance of all states s.Using this feature in combination with the sketch rule {V * > 0} → {V * ↓} would make all problems trivially solvable.Even with linear and quadratic features, we can capture complex state properties such as distances between objects.
Aside from limiting their computational complexity, we impose no further assumptions on the features.However, we found that all features required for the sketches below can be defined with a description logic grammar (Baader, Calvanese, McGuinness, Nardi, & Patel-Schneider, 2003;Francès et al., 2021) over primitive PDDL predicates.Using a common representation for the features simplifies evaluating them for a given state and makes it easy to check that they can indeed be evaluated in low polynomial time.We present the description logic grammar and the feature representations in the appendix.For the presentation and theoretical analysis of the sketches the feature representation is irrelevant, so we do not discuss it further in the main part of the article.

Proving Termination and Sketch Width
For each sketch introduced below we show that it uses goal-separating features, is terminating and has bounded and small sketch width.Showing that the features are goal separating is usually direct.
Proving termination is required to ensure that by iteratively moving from a state s to a subgoal state s ∈ G * R (s) we cannot get trapped in a cycle.The conditions under which a sketch R Φ is terminating are similar to those that ensure that a general policy π Φ is terminating (Srivastava, Zilberstein, Immerman, & Geffner, 2011;Bonet & Geffner, 2020b, 2021), and can be determined in polynomial time in the size of the policy graph G(R Φ ) using the SIEVE procedure where R Φ is interpreted as a general policy (Srivastava et al., 2011;Bonet & Geffner, 2020b).
The policy graph G(R Φ ), or simply G(R), for sketch R over features Φ has a node for each of the 2 |Φ| Boolean feature valuations over Φ, and edges b → b labeled with E if b is not a goal valuation and (b, b ) is compatible with a rule C → E in the sketch.
The SIEVE procedure was originally introduced for numerical features only.In the following, we extend it to also work for Boolean features.We say that a rule C → E changes a Boolean feature p if p ∈ C and ¬p ∈ E or vice versa.SIEVE then iteratively checks whether the feature changes on the edges b → b in a strongly connected component (SCC) of G(R) do not allow for infinite execution, i.e., at least some numerical feature n decreases (n↓) in the SCC and never increases (n↑), or at least one Boolean feature p changes from true to false (resp.false to true) in the SCC.The procedure then iteratively eliminates such edges, effectively breaking SCCs into smaller SCCs.If the largest SCC has size one, then the sketch R over features Φ is terminating; otherwise, the sketch is not terminating.
Often, however, a simple syntactic procedure suffices that eliminates sketch rules, one after the other, until none is left.This syntactic procedure is sound but not complete in general.The procedure iteratively applies one of the following cases until no rule is left (the sketch terminates) or until no further cases apply (there may be an infinite loop in the sketch): 1. Eliminate a rule r if it decreases a numerical feature n (n↓) that no other remaining rule can increase (n↑ or n?), and mark n to indicate that n changes finitely often, 2. Eliminate a rule r if it changes the value of a Boolean feature p that no other remaining rule changes in the opposite direction, and mark p to indicate that p changes finitely often, 3. Eliminate a rule r = C → E that decreases a numerical feature n or that changes a Boolean feature p to true (false) such that for all other remaining rules r = C → E it holds that if E changes the feature in the opposite direction, i.e., n↑, n? or changes p to false (true), there must be a condition on a numerical feature m or Boolean feature q in C that is marked and is complementary to the one in C , e.g., m > 0 ∈ C and m = 0 ∈ C or q ∈ C and ¬q ∈ C .
Theorem 4. The incomplete termination test procedure is sound in the sense that if it returns that a sketch R is terminating then R is indeed terminating.
Proof.We must show that if the incomplete procedure removes a rule r ∈ R then the SIEVE algorithm removes all edges with label r in the policy graph G(R).The SIEVE algorithm iteratively removes edges from the strongly connected components (SCCs) of the policy graph G(R).It removes an edge with label r from an SCC if the rule r decreases a numerical feature n, i.e., n↓, that is never incremented in the SCC, i.e., n? or n↑.Originally, SIEVE does not consider Boolean features in the elimination.We allow SIEVE to remove an edge with label r from an SCC if the rule r changes a Boolean feature p that is never changed in the opposite direction in the SCC.Since r changes a Boolean feature in the SCC that no other rule changes in the opposite direction then the rule r cannot be part of a cycle.Next, consider the cases 1-3 from above: 1.The eliminated rule r decreases n and no other rule r increases n.Hence, in all SCCs of G(R), the rule r decreases n and it is never increased in the SCC.Thus, SIEVE removes all edges labeled r.
2. The eliminated rule r changes p which no other rule r changes in the opposite direction.
Hence, in all SCCs of G(R), the rule r changes p which is never changed in the opposite direction in the SCC.Thus, SIEVE removes all edges labeled r.Showing that a sketch for problem class Q has sketch width k requires to prove that for all R-reachable states s in all problem instances P ∈ Q, the width of P [s] is bounded by k.Remember that P [s] is like P but with initial state s, and goal states G of P combined with goal states G r (s) of all r ∈ R. We need to show that the first subproblem P i = P [s] with initial state s of P and i = 0 has width at most k.Then for any closest subgoal state s of P i , we need to show that the next subproblem P i+1 = P [s ] has width at most k until all closest subgoal states are goals of P .We prove the width of each subproblem P i by providing an admissible chain t 1 , . . ., t m of size at most k where all optimal plans for t m are also optimal plans for P i .We overapproximate the set of R-reachable states where necessary to make the proofs more compact.This implies that our results provide an upper bound on the actual sketch width, which is small and tight.Furthermore, regardless of which rule r defines the closest subgoal G r (s) for an R-reachable state s, we show that P [s] with subgoal G r (s) in the R-reachable state s has width k.This suffices because the rule that defines the closest subgoals satisfies the optimality in the third part of the definition of width.

Floortile
In the Floortile domain (Linares López, Celorrio, & Olaya, 2015), a set of robots have to paint a set of tiles in a grid.As is done in the IPC tasks, we consider a simpler version of the domain where the robots have to paint a rectangular portion of a rectangular grid.There can be at most one robot a on each tile t at any time and the predicate robot-at(a, t) is true iff a is on tile t.If there is no robot on tile t then t is marked as clear, i.e., clear (t) holds.Robots can move left, right, up or down, if the target tile is clear.Each robot a is equipped with a brush that is configured to either paint in black or white, e.g., robot-has(a, black ) is true iff the brush of robot a is configured to paint in black .It is possible to change the color infinitely often.The goal is to paint a rectangular subset of the grid in chessboard style.If a tile t has color c then the predicate painted (t, c) holds and additionally the tile is marked as not clear, i.e., clear (t) does not hold.A robot a can only paint tile t if a is on a tile t that is below or above t and t is clear, i.e., robot-at(a, t ) holds, up(t , t) or down(t , t) holds, and clear (t) holds.
Consider the set of features Φ = {g, v} where g counts the number of unpainted tiles that need to be painted and v represents that the following condition is satisfied: for each tile t 1 that remains to be painted there exists a sequence of tiles t 1 , . . ., t n such that each t i with i = 1, . . ., n − 1 remains to be painted, t n does not need to be painted, and for all pairs t i−1 , t i with i = 2, . . ., n holds that t i is above which says that painting a tile such that the invariant v remains satisfied is good.
Theorem 5.The sketch R F Φ for the Floortile domain is well-formed and has width 2. Proof.Recall that a sketch is well-formed if it uses goal-separating features and is terminating.The features Φ are goal separating because the feature valuation g = 0 holds in state s iff s is a goal state.The sketch R Φ is terminating because r decreases the numerical feature g and no other rule increases g.
It remains to show that the sketch width is 2. Consider a Floortile instance P with states S. If the initial state s is a solvable non-goal state, then the feature conditions of r are true, and the subgoal G r (s) is nonempty.If we reach such a subgoal state, then either the feature conditions of r remain true because the invariant remains satisfied or the overall goal was reached.Next, we show that P [s] with subgoal G r (s) in R-reachable state s has width 2. Consider states S 1 ⊆ S where the feature conditions of rule r are true, i.e., solvable states where a tile t must be painted in a color c.We do a three-way case distinction over states S 1 .
First, consider states S 1 1 ⊆ S 1 where some robot a on tile t 1 that is configured to color c, can move to tile t n above or below t to paint it.The singleton tuple painted (t, c) implies G r (s) in s ∈ S 1 1 in the admissible chain that consists of moving a from t 1 to t n , while decreasing the distance to t n in each step, and painting t, i.e., (robot-at(a, t 1 ), . . ., robot-at(a, t n ), painted (t, c)).
Second, consider states S 2 1 ⊆ S 1 where the robot a must reconfigure its color from c to c before painting.The tuple (robot-at(a, 1 in the admissible chain that consists of reconfiguring the color, and then moving closer and painting as before, i.e., ((robot-at(a, t 1 ), robot-has(a, c )), (robot-at(a, t 1 ), robot-has(a, c)), . . ., We observe that reconfiguring requires an admissible chain of size 2 because of serializing the reconfiguring and the moving part.Therefore, in the following case, we assume that the robot must reconfigure its color.
Third, consider states S 3 1 ⊆ S 1 where robot a is standing on t and there is a sequence of robots a 1 , . . ., a n such that a can only paint t if each a 1 , . . ., a n moves in such a way that tile t above or below t becomes clear.Using the fact that a rectangular portion inside a rectangular grid has to be painted, it follows that the set of tiles that must not be painted is pairwise connected.Therefore, we can move each robot a i from its current tile t i to t i in such a way that after moving each robot, tile t becomes clear.The tuple (robot-at(a, t ), painted (t, c)) implies G r (s) in s ∈ S 3 1 in the admissible chain that consists of moving each robot a i from t i to t i in such a way that moving all of them clears tile t , followed by moving a to t , and painting t, i.e., ((robot-at(a, t), robot-has(a, c )), (robot-at(a, t), robot-has(a, c)), (robot-at(a 1 , t 1 ), robot-has(a, c)), . . ., We obtain sketch width 2 because all tuples in admissible chains have size of at most 2.

TPP
In the Traveling Purchaser Problem (TPP) domain, there is a set of places that can either be markets or depots, a set of trucks, and a set of goods (Gerevini, Haslum, Long, Saetti, & Dimopoulos, 2009).The places are connected via a roads, allowing trucks to drive between them.If a truck t is at place p, then atom at(t, p) holds.Each market p sells specific quantities of goods, e.g., atom on-sale(g, p, 2) represents that market p sells two quantities of good g.If there is a truck t available at market p, it can buy a fraction of the available quantity of good g, making g available to be loaded into t, while the quantity available at p decreases accordingly, i.e., atom on-sale(g, p, 1) and ready-to-load (g, p, 1) hold afterwards.The trucks can unload the goods at any depot, effectively increasing the number of stored goods, e.g., atom stored (g, 1) becomes false, and stored (g, 2) becomes true, indicating that two quantities of good g are stored.The goal is to store specific quantities of specific goods.SIW fails in TPP because loading sufficiently many quantities of a single good can require buying and loading them from different markets.Making the goods available optimally requires taking the direct route to each market followed by buying the quantity of goods.Thus, the problem width is bounded by the number of quantities needed.
Consider the set of features Φ = {b, l , n} where b is the number of ready-to-be-loaded goods that are not bought at a market and of which some quantity remains to be stored, l is the number of goods of which some quantity remains to be stored and are not loaded in a truck, and n is the sum of quantities of goods that remain to be stored.The sketch rules in R T Φ are defined as where rule r 1 says that buying any quantity of a good that remains to be stored is good, rule r 2 says that loading any quantity of a good that remains to be stored is good, and rule r 3 says that storing any quantity of a good for which have not yet stored enough is good.2 Theorem 6.The sketch R T Φ for the TPP domain is well-formed and has width 1.
Proof.The features are goal separating because n = 0 holds in state s iff s is a goal state.We show that the sketch R Φ is terminating by iteratively eliminating rules: r 3 decreases the numerical feature n which no other rule increments, so we eliminate r 3 and mark n.Next, r 2 decreases l which no other rule increments, so we eliminate r 2 and mark l .Now only r 1 remains, and we can eliminate it since it decreases b, which is never incremented.It remains to show that the sketch width is 1.Consider any TPP instance P .In the initial state s, the feature conditions of at least one rule r are true and the corresponding subgoal G r (s) is nonempty.In the subgoal states s ∈ G r 1 (s) of some state s the feature conditions of r 2 must be true, the set of subgoal states G r 2 (s ) is nonempty, and the feature conditions of r 1 can remain true and the set of subgoal states remains nonempty.Similarly, in the subgoal states s ∈ G r 2 (s) of some state s the feature conditions of r 3 must be true, the set of subgoal states G r 3 (s ) is nonempty, and the feature conditions of r 1 and r 2 can remain true and the set of subgoal states remains nonempty.At some point, the subgoal of r 3 is the overall goal of the problem.Next, we show that the sketch has width 1.
First, we consider rule r 1 .Intuitively, we show that buying a good, that is not yet ready to be loaded but of which some quantity remains to be stored in a depot, has width at most 1.Consider states S 1 ⊆ S where the feature conditions of r 1 are true, i.e., states where there is no good g bought and therefore ready to be loaded in a truck but of which some quantity remains to be stored in a depot.With G r 1 (s) we denote the subgoal states of r 1 in s ∈ S 1 , i.e., states where some quantity q b of g is bought and therefore ready to be loaded.The tuple ready-to-load (g, p, q b ) implies G r 1 (s) in s ∈ S 1 in the admissible chain that consists of moving a truck t from its current place p 1 to the closest market p n where nonzero quantities q a of g are available, ordered descendingly by their distance to p n , buying q b ≤ q a quantities of g, i.e., (at(t, p 1 ), . . ., at(t, p n ), ready-to-load (g, p n , q b )).
Second, we consider rule r 2 .Intuitively, we show that loading a good that is not yet loaded but of which some quantity remains to be stored in a depot has width at most 1.Consider states S 2 ⊆ S where the feature conditions of r 2 are true, and some quantity of a good g that remains to be stored is ready to be loaded (see above), i.e., states where there is no truck that has g loaded, but of which some quantity remains to be stored in a depot, and there is some nonzero quantity q b of g at place p n ready to be loaded.With G r 2 (s) we denote the subgoal states of r 2 in s ∈ S 2 , i.e., states where some quantity q l of g is loaded into a truck t.The tuple loaded (g, t, q l ) implies G r 2 (s) in s ∈ S 2 in the admissible chain that consists of moving t from its current place p 1 to the closest market p n , ordered descendingly by their distance to p n , loading q l ≤ q b quantities of g, i.e., (at(t, p 1 ), . . ., at(t, p n ), loaded (g, t, q l )).
Last, we consider rule r 3 .Intuitively, we show that storing a good of which some quantity remains to be stored in a depot has width at most 1.Consider states S 3 ⊆ S where the feature conditions of r 3 are true and some quantity of a good that remains to be stored is loaded (see above), i.e., states where some quantity of a good g remains to be stored in a depot, and some nonzero quantity q l of g is loaded into a truck t because it has width 1 (see above).With G r 3 (s) we denote the subgoal states of r 3 in s ∈ S 3 , i.e., states where the remaining quantity of g that remains to be stored has decreased.The tuple stored (g, q s ) implies G r 3 (s) in s ∈ S 3 in the admissible chain that consists of moving t from its current place p 1 to the closest depot at place p n , ordered descendingly by their distance to p n , storing q l ≤ q l quantities of g, i.e., (at(t, p 1 ), . . ., at(t, p n ), stored (g, q s )).
We obtain sketch width 1 because all tuples in admissible chains have a size of at most 1.

Barman
In the Barman domain (Linares López et al., 2015), there is a set of shakers, a set of shots, and a set of dispensers where each dispenses a different ingredient.There are recipes of cocktails, each consisting of two ingredients, e.g., the recipe for cocktail c consists of ingredients i 1 , i 2 .The goal is to serve beverages, i.e., ingredients and/or cocktails.A beverage b is served in shot g if g contains b.An ingredient i can be filled into shot g using one of the dispensers if g is clean.Producing a cocktail c with a shaker t requires both ingredients i 1 , i 2 of c to be in t.In such a situation, shaking t produces c.Pouring a cocktail from t into shot g requires g to be clean.The barman robot has two hands which limits the number of shots and shakers it can hold at the same time.Therefore, the barman often has to put down an object before it can grasp a different object.For example, assume that the barman holds the shaker t and some shot g and assume that ingredient i must be filled into shot g.Then the barman has to put down either t or g so that it can pick up g with hand h.As in the Barman tasks from previous IPCs, we assume that there is only a single shaker and that it is initially empty.Consider the set of features Φ = {g, u, c 1 , c 2 } where g is the number of unserved beverages, u is the number of used shots, i.e., shots with a beverage different from the one mentioned in the goal, c 1 is true iff the first recipe ingredient of an unserved cocktail is in the shaker, and c 2 is true iff both recipe ingredients of an unserved cocktail are in the shaker.We define the following sketch rules for R B Φ : Rule r 1 says that filling an ingredient into the shaker is good if this ingredient is mentioned in the first part of the recipe of an unserved cocktail.Rule r 2 says the same for the second ingredient, after the first ingredient has been added.Requiring the ingredients to be filled into the shaker in a fixed order ensures bounded width, even for arbitrary-sized recipes.Rule r 3 says that cleaning shots is good and rule r 4 says that serving an ingredient or cocktail is good.
Theorem 7. The sketch R B Φ for the Barman domain is well-formed and has width 2.
Proof.The features Φ are goal separating because g = 0 holds in state s iff s is a goal state.We show that the sketch is terminating by iteratively eliminating rules: first, we eliminate r 4 because it decreases the numerical feature g that no rule increases.Next, rules r 1 and r 2 can be eliminated because both change a Boolean feature that no remaining rule changes in the opposite direction.Last, we eliminate the r 3 because it decrements the numerical feature u.
It remains to show that the sketch width is 2. Consider any Barman instance P with states S. In the initial state s the feature conditions of r 4 are true, and the subgoal G r 4 (s) is nonempty.Note that using r 4 to reach a subgoal decreases the number of unserved beverages until the overall goal is reached.Hence, r 4 can be seen as the goal counter.If the beverage to be served is a cocktail or if the shots are dirty, then this subproblem can be further decomposed into smaller subproblems using rules r 1 , r 2 , r 3 as follows.Producing a cocktail requires filling the shaker with correct ingredients and can be achieved by successively reaching the subgoals defined by rules r 1 and r 2 .Next, if the shot required for serving was made dirty during this process, then r 3 defines the subgoal of cleaning it again.Finally, r 4 defines the subgoal of serving the cocktail.
We first consider rule r 3 .Intuitively, we show that shots are cleaned with width at most 1.Consider all states S 1 ⊆ S where the feature conditions of r 3 are true, i.e., states where there is a used shot g such that used (g, b) holds for some beverage b that is not supposed to be in g according to the goal description.With G r 3 (s) we denote the subgoal states of r 3 in s ∈ S 1 , i.e., states where g is clean.We do a case distinction over states S 1 .First, consider states S 1 1 ⊆ S 1 where the barman is holding g in hand h.The tuple clean(g) implies G r 3 (s) for all s ∈ S 1 1 in the admissible chain that consists of cleaning g, i.e., (holding(h, g), clean(g)).Second, consider states S 2 1 ⊆ S 1 where the barman must grasp g with empty hand h first.The same tuple clean(g) implies G r 3 (s) for all s ∈ S 2 1 in the admissible chain that consists of picking g, and cleaning g, i.e., (ontable(g), holding(h, g), clean(g)).Last, consider states S 3 1 ⊆ S 1 where the barman must exchange g with g in hand h first.The same tuple clean(g) implies G r 3 (s) for all s ∈ S 3 1 in the admissible chain that consists of putting down g , picking up g, cleaning g, i.e., (holding(h, g ), ontable(g ), holding(h, g), clean(g)).It also follows that we can reduce the set of R-reachable states in our analysis to those where the container is already grasped if only a single container is affected.
Next, we consider rule r 1 .Intuitively, we show that filling the first ingredient into the shaker for producing a required cocktail has width at most 2. Consider states S 2 ⊆ S where the feature conditions of r 1 are true and required shots are clean, i.e., states where no ingredient i 1 consistent with the first part of some unserved cocktail c's recipe is in the shaker t.We do not need to consider states where required shots are not clean because a shot can be cleaned with width 1 (see above).With G r 1 (s) we denote the subgoal states of r 1 in s, i.e., states where an ingredient i 1 consistent with the first recipe part of some unserved cocktail c is inside t.The tuple (contains(t, i 1 ), shaker -level (t, l 1)) implies G r 1 (s) in the admissible chain that consists of cleaning t, putting down t, picking a clean shot g, filling i 1 into g using the corresponding dispenser, and pouring g into t, i.e., ((holding(h, t), shaker -level (t, l 2)), (holding(h, t), empty(t)), (holding(h, t), clean(t)), (ontable(t), clean(t)), (holding(h, g), clean(t)), (contains(g, i 1 ), clean(t)), (contains(t, i 1 ), shaker -level (t, l 1))).
Note that the feature conditions of rule r 2 are true in states s in subgoal G r 1 (s) with nonempty subgoal G r 2 (s ).
Next, we consider rule r 2 .Intuitively, we show that filling the second ingredient into the shaker for producing a required cocktail has width at most 1.Consider states S 3 ⊆ S where the feature conditions of r 2 are true and required shots are clean, i.e., states where the first ingredient consistent with the recipe of an unserved cocktail c is in the shaker t, and required shots are clean because a shot can be cleaned with width 1 (see above).With G r 2 (s) we denote the subgoal states of r 2 in s, i.e., states where an ingredient i 2 is inside t such that both ingredients in t are consistent with the recipe of an unserved cocktail c.The tuple (contains(t, i 2 ), shaker -level (t, l 2)) implies G r 2 (s) in the admissible chain that consists of putting down t, grasping g, filling i 2 into g using the corresponding dispenser, and pouring g into t, i.e., ((holding(h, t), shaker -level (t, l 1)), (ontable(t), shaker -level (t, l 1)), (holding(h, g), ontable(t)), (contains(g, i 2 ), ontable(t)), (contains(t, i 2 ), shaker -level (t, l 2))).
Finally, we consider rule r 4 , where we show intuitively that serving a beverage has width at most 1.We do a case distinction over all states S 4 where the feature conditions of r 4 are true, i.e., states where there is an unserved ingredient or an unserved cocktail.First, consider states S 1 4 ⊆ S 4 where there is an unserved ingredient i .G 1 r 4 (s) is the set of subgoal states for r 4 in s ∈ S 1 4 where i is served.The tuple contains(g, i ) implies G 1 r 4 (s) in the admissible chain that consists of filling i into g using the corresponding dispenser, i.e., (clean(g), contains(g, i )).Last, consider states S 2 4 ⊆ S 4 where there is an unserved cocktail c, respective ingredients are in the shaker using the results of rule r 1 , r 2 , and required shots are clean using the results of rule r 3 .With G 2 r 4 (s) we denote the subgoal states of r 4 in s ∈ S 2 4 where c is served.The tuple contains(g, c) implies G 2 r 4 (s) in the admissible chain that consists of putting down g (or any other shot) because shaking requires only the shaker t to be held, shaking t, and pouring t into g, i.e., (holding(h, g), ontable(g), contains(t, c), contains(g, c)) We obtain sketch width 2 because all tuples in admissible chains have a size of at most 2.

Grid
In the Grid domain (McDermott, 2000), a single robot operates in a grid-structured world.There are keys and locks distributed over the grid cells.The robot can move to a cell c above, below, left or right of its current cell if c does not contain a closed lock or another robot.The robot can drop, pick or exchange keys at its current cell and can only hold a single key e at any time.Keys and locks have different shapes and the robot, holding a matching key, can open a lock when standing on a neighboring cell.The goal is to move keys to specific target locations that can be locked initially.Initially, it is possible to reach every lock for the unlock operation.SIW fails in this domain when goals need to be undone, i.e., a key has to be picked up from its target location to open a lock that is necessary for picking or moving a different key.
Consider the set of features Φ = {l , k , o, t} where l is the number of locked grid cells, k is the number of misplaced keys, o is true iff the robot holds a key for which there is a closed lock, and t is true iff the robot holds a key that must be placed at some target grid cell.We define the sketch rules for R G Φ as: Rule r 1 says that unlocking grid cells is good.Rule r 2 says that placing a key at its target cell is good after opening all locks.Rule r 3 says that picking up a key that can be used to open a locked grid cell is good if there are locked grid cells.Rule r 4 says that picking up a misplaced key is good after opening all locks.
Theorem 8.The sketch R G Φ for the Grid domain is well-formed and has width 2. 3 Proof.The features Φ are goal separating because the feature valuation k = 0 holds in state s iff s is a goal state.We show that the sketch is terminating by iteratively eliminating rules: r 1 decreases l which no other rule increases, so we eliminate r 1 and mark l .Now r 2 can be eliminated because it decreases k which no remaining rule increases.We can now eliminate r 3 = C → E because it changes the Boolean feature o and the only other remaining rule r 4 = C → E may restore the value of o, but this can only happen finitely often, since l is marked and l > 0 ∈ C and l = 0 ∈ C .Now only r 4 remains and we can eliminate it since it changes t, which is never changed back.
It remains to show that the sketch width is 1.Consider any Grid instance P with states S. Note that depending on the initial state s the feature conditions of at least one rule r are true in s and its subgoal G r (s) are nonempty.We first consider rule r 3 .Intuitively, we show that picking up a key that can be used to open some closed lock has width 1.Consider states S 1 ⊆ S where the 3.The proof for the Grid sketch in our previous work (Drexler et al., 2021) contains an error: we claimed that the subproblems of moving a key to its target cell have width 1.This is wrong because placing a key at a target location by exchanging it with another key that is at its goal location, results in a state that is not a subgoal state.Hence, the correct atom tuple in the admissible chain of rule r2, which also captures that the arm must be empty, is (at(e, cn), arm-empty()) of size 2. A sketch of width 1 can be obtained by splitting the problem into moving to the target location of the key with width 1 and then dropping the key with width 0. Since the exchange action can be simulated with a pick and drop action, the above sketch still has a width of 1 for the domain variant where the exchange action is removed entirely.
feature conditions of r 3 are true, i.e., states where there is a closed lock and the robot does not hold a key e that can be used to open a closed lock.With G r 3 (s) we denote the subgoal states of r 3 in s ∈ S 1 , i.e., states where the robot holds e.The tuple holding(e) implies G r 3 (s) in s ∈ S 1 in the admissible chain that consists of changing the position of the robot from the current position c 1 to the position c n of e ordered by the distance to c n , and followed by exchanging or picking e, i.e., (at-robot(c 1 ), . . ., at-robot(c n ), holding(e)).Note that the feature conditions of r 1 are true in states s in subgoal G r 3 (s) with nonempty subgoal states G r 1 (s ) because the number of closed locks remains greater than 0.
Next, we consider rule r 1 .Intuitively, we show that opening a closed lock has width 1.Consider states S 2 ⊆ S where feature conditions of r 1 are true and the robot holds a key e that can be used to open a closed lock d .We can transform states where the robot holds no key into a state from S 2 by letting it pick a key with width 1 (see above).With G r 1 (s) we denote the subgoal states of r 1 in s ∈ S 2 , i.e., states where d is open.The tuple open(d ) implies G r 1 (s) in s ∈ S 2 in the admissible chain that consists of changing the position of the robot from its current position c 1 to a position c n next to lock d ordered by the distance to c n , i.e., (at-robot(c 1 ), . . ., at-robot(c n ), open(d )).Note that either there are still closed locks that can be opened by repeated usage of rules r 1 and r 3 or the feature conditions of r 2 or r 4 are true in states s in the subgoal G r 1 (s) with nonempty subgoal states G r 2 (s ) or G r 4 (s ) respectively because there are misplaced keys.Hence, it remains to show that if all locks are open then well-placing keys has width 1.
Next, we consider rule r 4 .Intuitively, we show that picking up a key that is not at its target cell has width 1.Consider states S 3 ⊆ S where the feature conditions of r 4 are true, i.e., states where all locks are open and the robot does not hold a misplaced key.With G r 4 (s) we denote the subgoal states of r 4 in s ∈ S 3 , i.e., states where the robot holds e.The tuple holding(e) implies G r 4 (s) in s ∈ S 3 in the admissible chain that consists of changing the position of the robot from the current position c 1 to the position c n of e ordered by the distance to c n , and followed by exchanging or picking e, i.e., (at-robot(c 1 ), . . ., at-robot(c n ), holding(e)).Note that the feature conditions of r 2 are true in states s in subgoal G r 4 (s) with nonempty subgoal states G r 2 (s ) because the number of misplaced keys remains greater than 0.
Finally, we consider rule r 2 .Intuitively, we show that moving a key to its target cell has width 2. Consider states S 4 ⊆ S where the feature conditions of r 2 are true and the robot holds a misplaced key e.As before, we can transform states s / ∈ S 4 into such a state s by picking up e with width 1.With G r 2 (s) we denote the subgoal states of r 2 in s ∈ S 4 , i.e., states where e is at its target cell.The tuple (at(e, c n ), arm-empty()) implies G r 2 (s) in s ∈ S 4 in the admissible chain that consists of changing the position of the robot from its current position c 1 to the key's target cell c n ordered by the distance to c n , followed by dropping e at c n , i.e., (at-robot(c 1 ), . . ., at-robot(c n ), (at(e, c n ), arm-empty())).
We obtain sketch width 2 because all tuples in admissible chains have size at most 2.

Childsnack
In the Childsnack domain (Vallati, Chrpa, & McCluskey, 2018), there is a set of contents, a set of trays, a set of gluten-free breads, a set of regular breads that contain gluten, a set of gluten-allergic children, a set of children without gluten allergy, and a set of tables where the children sit.The goal is to serve the gluten-allergic children with sandwiches made of gluten-free bread and the non-allergic children with either type of sandwich.
The Childsnack domain has large bounded width because moving an empty tray is possible at any given time.The goal serialization fails because it gets trapped in deadends when serving nonallergic children with gluten-free sandwiches while leaving insufficiently many gluten-free sandwiches for the allergic children.
Consider the set of features Φ = {c g , c r , s k g , s k , s t g , s t } where c g is the number of unserved gluten-allergic children, c r is the number of unserved non-allergic children, s k g holds iff there is a gluten-free sandwich in the kitchen, s k holds iff there is any sandwich in the kitchen, s t g holds iff there is a gluten-free sandwich on a tray, and s t holds iff there is any sandwich on a tray.We define the following sketch rules R C Φ : g , ¬s t g } → {s k g ?, s k ?, s t g , s t } r 4 = {c g = 0, c r > 0, s k , ¬s t } → {s k g ?, s k ?, s t g ?, s t } r 5 = {c g > 0, s t g } → {c g ↓, s t g ?, s t ?} r 6 = {c g = 0, c r > 0, s t } → {c r ↓, s t g ?, s t ?} Rule r 1 says that making a gluten-free sandwich is good if there is an unserved gluten-allergic child and if there is no other gluten-free sandwich currently being served.Rule r 2 says the same thing for non-allergic children after all gluten-allergic children have been served and the sandwich to be made is not required to be gluten free.Rules r 3 and r 4 say that putting a gluten-free (resp.regular) sandwich from the kitchen onto a tray is good if there is none on a tray yet.Rule r 5 says that serving gluten-allergic children before non-allergic children is good if there is a gluten-free sandwich available on a tray.Rule r 6 says that serving non-allergic children afterwards is good.
Theorem 9.The sketch R C Φ for the Childsnack domain is well-formed and has width 1. Proof.The features are goal separating because the feature valuations c g = 0 and c r = 0 hold in state s iff s is a goal state.We show that the sketch is terminating by iteratively eliminating rules: r 5 decreases the numerical feature c g which no other rule increments, so we eliminate r 5 and mark c g .Similarly, r 6 decreases the numerical feature c r which no other rule increments, so we eliminate r 6 and mark c r .Then rules r 4 changes s t and no remaining rules changes s t in the opposite direction, so we eliminate r 4 .Likewise, we eliminate r 3 because it changes s t g , which no remaining rule can change back.Last, we eliminate rules r 1 and r 2 because they change s k g resp.s k , and no remaining rule can change the values in the opposite direction.
It remains to show that the sketch width is 1.Consider any Childsnack instance with states S. Note that if there is an unserved gluten-allergic child in the initial state then rules r 1 , r 3 , r 5 define subgoals for serving a gluten-allergic child.If there is no unserved gluten-allergic child but there is an unserved non-allergic child then rules r 2 , r 4 , r 6 define subgoals for serving a non-allergic child.
In the following, we first show that serving a gluten-free sandwich to a gluten-allergic child has width 1 and deduce the case of serving a non-allergic child from it.
We first consider rule r 1 .Intuitively, we show that producing a gluten-free sandwich has width 1.Consider states S 3 ⊆ S where the feature conditions of r 1 are true, i.e., states where there is an unserved gluten-allergic child c and there is no gluten-free sandwich available in kitchen nor on a tray.With G r 1 (s) we denote the subgoal states of r 1 in s ∈ S 3 , i.e., states where gluten-free sandwich s is available in kitchen.The tuple no-gluten-sandwich(s) implies G r 1 (s) in s ∈ S 3 in the admissible chain that consists of producing s, i.e., (notexists(s), no-gluten-sandwich(s)).Note that the feature conditions of r 3 are true in states s in the subgoal G r 1 (s) with nonempty subgoal G r 3 (s ).
Next, we consider rule r 3 .Intuitively, we show that moving a gluten-free sandwich from the kitchen onto a tray has width 1.Consider states S 2 ⊆ S where the feature conditions of r 3 are true, i.e., states where there is an unserved gluten-allergic child c and there is a gluten-free sandwich s available in kitchen.With G r 3 (s) we denote the subgoal states of r 3 in s ∈ S 2 , i.e., states where s is on p.The tuple ontray(s, p) implies G r 3 (s) in s ∈ S 2 in the admissible chain that consists of moving p from t to kitchen, putting s onto p, i.e., (at(p, t), at(p, kitchen), ontray(s, p)).Note that the feature conditions of r 5 are true in states s in the subgoal G r 3 (s) with nonempty subgoal G r 5 (s ).
Next, we consider rule r 5 .Intuitively, we show that serving a gluten-allergic child if there is a gluten-free sandwich is available on a tray has width 1.Consider states S 1 ⊆ S where the feature conditions of r 5 are true, i.e., states where there is an unserved gluten-allergic child c and there is a gluten-free sandwich s on a tray p.With G r 5 (s) we denote the subgoal states of r 5 in s ∈ S 1 , i.e., states where c is served.The tuple served (c) implies G r 5 (s) in s ∈ S 1 in the admissible chain that consists of moving p from kitchen to t, serving c with s, i.e., (ontray(s, p), at(p, t), served (c)).Note that if all gluten allergic children are served in this way by using rules r 1 , r 3 , r 5 then either G was reached or there are unserved non-allergic children.In the latter case, the problem is very similar to the one we considered above and rules r 2 , r 4 , r 6 define the corresponding subgoals to serve a non-allergic child.We omit the details but provide the admissible chains that are necessary to conclude the proof: the tuple served (c) implies G r 6 (s) in the admissible chain (ontray(s, p), at(p, t), served (c)).The tuple ontray(s, p) implies G r 4 (s) in the admissible chain (at(p, t), at(p, kitchen), ontray(s, p)).The tuple at-kitchen-sandwich(s) implies G r 2 (s) in the admissible chain (notexists(s)), at-kitchen-sandwich(s)).
We obtain sketch width 1 because all tuples in admissible chains have a size of at most 1.

Driverlog
In the Driverlog domain (Long & Fox, 2003), there is a set of drivers, trucks, packages, road locations and path locations.The two types of locations form two strongly connected graphs and the two sets of vertices overlap.The road graph is only traversable by trucks, while the path graph is only traversable by drivers.A package can be delivered by loading it into a truck, driving the truck to the target location of the package followed by unloading the package.Driving the truck requires a driver to be in the truck.Not only packages, but also trucks and drivers can have goal locations.SIW fails because it can be necessary to undo previously achieved goals, like moving a truck away from its destination to transport a package.The following sketch induces a goal ordering such that an increasing subset of goal atoms never needs to be undone.
Consider the set of features Φ = {p, t, d g , d t , b, l } where p is the number of misplaced packages, t is the number of misplaced trucks, d g is the sum of all distances of drivers to their respective goal locations, d t is the minimum distance of any driver to a misplaced truck, b is true iff there is a driver inside of a truck, and l is true iff there is a misplaced package in a truck.We define the sketch rules R D Φ as follows: Rule r 1 says that letting a driver board any truck is good if there are undelivered packages and there is no driver boarded yet.Rule r 2 says that loading an undelivered package is good.Rule r 3 says that delivering a package is good.Rule r 4 says that moving any driver closer to being in a misplaced truck is good after having delivered all packages.Rule r 5 says that driving a misplaced truck to its target location is good once all packages are delivered.Rule r 6 says that moving a misplaced driver closer to its target location is good after having delivered all packages and trucks.
Theorem 10.The sketch R D Φ for the Driverlog domain is well-formed and has width 1. Proof.The features are goal separating because the feature valuations p = 0, t = 0, d g = 0 hold in state s iff s is a goal state.We show that the sketch is terminating by iteratively eliminating rules: r 3 decreases the numerical feature p that no other remaining rule increments, so we eliminate r 3 and mark p.We can now eliminate r 5 = C → E because it decreases the numerical feature t and the only other remaining rule r 2 = C → E arbitrarily changes t, but this can only happen finitely many times, since p is marked and p = 0 ∈ C and p > 0 ∈ C .Next, we can eliminate r 2 because it sets the Boolean feature l and no other remaining rule changes l in the opposite direction.We can now eliminate r 4 = C → E because it decreases the numerical feature d t and the only other remaining rule r 1 = C → E arbitrarily changes d t , but this can only happen finitely many times, since p is marked and p = 0 ∈ C and p > 0 ∈ C .Next, we can now eliminate r 6 = C → E because it decreases the numerical feature d g and the only other remaining rule r 1 = C → E arbitrarily changes d g , but this can only happen finitely many times, since p is marked and p = 0 ∈ C and p > 0 ∈ C .Last, we eliminate the remaining rule r 1 because it sets the Boolean feature b to true.
It remains to show that the sketch width is 1.Consider any Driverlog instance with states S. If there are misplaced packages in the initial state, then rule r 3 decrements the number of misplaced packages.Therefore, we show that moving packages to their target location has width 1.Consider states S 1 ⊆ S where there is a misplaced package p at location c m with target location c o .We do a three-way case distinction over all states S 1 and show that moving a package to its target location has width 1.First, consider rule r 1 .Intuitively, we show that boarding some driver into a truck has width 1.Consider states S 1 1 ⊆ S 1 where the feature conditions of rule r 1 are true, i.e., states there is no driver boarded into any truck.With G r 1 (s) we denote the subgoal states of r 1 in s ∈ S 1 1 , i.e., states where a driver d is boarded into a truck t.The tuple driving(d , t) implies G r 1 (s) in s ∈ S 1 1 in admissible chain that consists of moving d from c 1 to c n , each step decreasing the distance to c n , boarding d into t, i.e., (at(d , c 1 ), . . ., at(d , c n ), driving(d , t)).
Second, consider rule r 2 .Intuitively, we show that loading a misplaced package into a truck has width 1.Consider states S 2 1 ⊆ S 1 where the feature conditions of rule r 2 are true and where d is boarded into truck t at location l n , i.e., no misplaced package is loaded, and d is boarded into t at location l n because boarding has d into t if there is a misplaced package has width 1 (see above).With G r 2 (s) we denote the subgoal states of r 2 in s ∈ S 2 1 , i.e., states where p is loaded into t.The tuple in(p, t) implies G r 2 (s) in s ∈ S 2 1 in the admissible chain that consists of driving t from c n to c m , each step decreasing the distance to c m , loading p into t, i.e., (at(t, c n ), . . ., at(t, c m ), in(p, t)).
Third, consider rule r 3 .Intuitively, we show that moving a package to it target location has width 1.Consider states S 3 1 ⊆ S 1 where the feature conditions of rule r 3 are true and where p and d is in t at c m , i.e., states where p and d is in t at c m because loading driver and misplaced package has width 1 (see above).With G r 3 (s) we denote the subgoal states of r 3 in s ∈ S 3 1 , i.e., states where p is at location c o .The tuple at(p, c o ) implies G r 3 (s) in the admissible chain that consists of driving t from c m to c o , each step decreasing the distance to c o , and unloading p, i.e., (at(t, c m ), . . ., at(t, c o ), at(p, c o )).Now, consider states S 2 where all packages are at their respective target location and there is a misplaced truck t at location l n with target location l m .This can either be the case in the initial state or after moving the packages because it requires to use trucks.We do a two-way case distinction over all states S 2 and show that moving a truck to its target location has width 1.Consider rule r 4 .Intuitively, we show that boarding a driver into a misplaced truck without using any truck has width 1.Consider states S 1 2 ⊆ S 2 where the feature conditions of rule r 4 are true, i.e., where there is a driver d at location c 1 with nonzero distance until being boarded into t.With G r 4 (s) we denote the subgoal states of r 4 in s ∈ S 1 2 , i.e., states where d is one step closer to being boarded into t.There are three possible admissible chains that must be considered.(1) unboarding d from some truck t , i.e., tuple at(d , c 1 ) implies G r 4 (s) in s ∈ S 1 2 in the admissible chain (driving(d , t ), at(d , c 1 )), (2) moving d closer to c n over c i−1 to c i , i.e., tuple at(d , c i ) implies G r 4 (s) in s ∈ S 1 2 in the admissible chain (at(d , c i−1 ), at(d , c i )), and (3) boarding d into t at c n , i.e., driving(d , t) implies G r 4 (s) in s ∈ S 1 2 in the admissible chain (at(d , c n ), driving(d , t)).Second, consider rule r 5 .Intuitively, we show that moving a misplaced truck to its target location has width 1.Consider states S 2 2 ⊆ S 2 where the feature conditions of rule r 5 are true and where some driver is boarded into t, i.e., states where d is boarded intro t at c n .With G r 5 (s) we denote the subgoal states of r 5 in s ∈ S 2 2 , i.e., states where t is at its target location.The tuple at(t, c n ) implies G r 5 (s) in s ∈ S 2 2 in the admissible chain that consists of moving t from c n to c m , each step decreasing the distance to c m , i.e., (at(t, c n ), . . ., at(t, c m )).
Now, consider states S 3 where all packages and trucks are at their respective target location and there is a misplaced driver d boarded or unboarded at location l 1 with target location l n .This can either be the case in the initial state or after moving the packages and trucks.Consider rule r 6 .Intuitively, we show that moving a driver to its target location without using any truck has width 1.With G r 6 (s) we denote the subgoal states of r 6 in s ∈ S 3 , i.e., states where d is at its target location.There are two possible admissible chains that must be considered.( 1) unboarding d at location c 1 , i.e., tuple at(d , c 1 )) implies G r 6 (s) in s ∈ S 3 in the admissible chain (driving(d , t), at(d , c 1 )), and (2) moving d closer from location c i−1 to c i , i.e., tuple at(d , c i ) implies G r 6 (s) in s ∈ S 3 in the admissible chain (at(d , c i−1 ), at(d , c i )).
We obtain sketch width 1 because all tuples in admissible chains have size 1.Note that when dropping rules r 1 and r 2 , as well as features l and b, the sketch width becomes 2 because we must merge the three admissible chains of the first subproblem.When merging, tuples of size 2 must be considered, each consisting of a location and whether the driver drives the truck or whether the package is loaded.

Schedule
In the Schedule domain (Bacchus, 2001), there is a set of objects that can have different values for the following attributes: shape, color, surface condition, and temperature.Also, there is a set of machines where each is capable of changing an attribute with the side effect that other attributes change as well.For example, rolling an object changes its shape to cylindrical and has the side effect that the color changes to uncolored, any surface condition is removed, and the object becomes hot.Often, there are multiple different work steps for achieving a specific attribute of an object.For example, both rolling and lathing change an object's shape to cylindrical.But rolling makes the object hot, while lathing keeps its temperature cold.Some work steps are only possible if the object is cold.Multiple work steps can be scheduled to available machines, which sets the machine's status to occupied.All machines become available again after a single do-time-step action.The goal is to change the attributes of objects.
SIW fails in Schedule because it gets trapped into deadends when an object's temperature becomes hot, possibly blocking other required attribute changes.The following sketch uses this observation and defines an ordering over achieved attributes where first, the desired shapes are achieved, second, the desired surface conditions are achieved, and third, the desired colors are achieved.
Consider the set of features Φ = {p 1 , p 2 , p 3 , h, o} where p 1 is the number of objects with the wrong shape, p 2 is the number of objects with the wrong surface condition, p 3 is the number of objects with the wrong color, h is the number of hot objects, and o is true iff there is an object scheduled or a machine occupied.We define the following sketch rules R S Φ : Rule r 1 says that achieving an object's goal shape is good.Rule r 2 says that achieving an object's goal surface condition is good after achieving all goal shapes.Rule r 3 says that achieving an object's goal color is good after achieving all goal shapes and surface conditions.Rule r 4 says that making objects and machines available is good.Note that r 4 does not decrease the sketch width but it decreases the search time by decreasing the search depth.Note also that h never occurs in any rule because we want its value to remain constant.
Theorem 11.The sketch R S Φ for the Schedule domain is well-formed and has width 0.4 Proof.The features are goal separating because the feature valuations p 1 = 0, p 2 = 0, p 3 = 0 hold in state s iff s is a goal state.We show that the sketch is terminating by iteratively eliminating rules: Rule r 1 decreases the numerical feature p 1 that no other remaining rule increments, so we eliminate r 1 and mark p 1 .Rule r 2 decreases the numerical feature p 2 that no other remaining rule increments, so we eliminate r 2 and mark p 2 .Rule r 3 decreases the numerical feature p 3 that no other remaining rule increments, so we eliminate r 3 and mark p 3 .Last, we eliminate the only remaining rule r 4 because it sets the Boolean feature o to false.It remains to show that the sketch width is 0. Consider any Schedule instance with states S. First, consider states S 1 ⊆ S where the feature conditions of r 4 are true, i.e., there is either a scheduled object or a machine occupied.This can be the case in the initial state or if an object is scheduled to be processed by a machine.With G r 4 (s) we denote the subgoal states of r 4 in s ∈ S 1 , i.e., states where no object is scheduled and no machine is occupied and all objects have the same shape, surface condition, color, and temperature.The action that performs a time step always reaches a subgoal state in G r 4 (s) in a single step.Now, consider states S 2 ⊆ S where the feature conditions of r 1 are true and there is no object scheduled and no occupied machine, i.e., states where there is an object a that has shape x that is not the shape y mentioned in the goal, and there is no object scheduled and no occupied machine because this can be achieved with width 0 (see above).With G r 1 (s) we denote the set of subgoal states of r 1 in s ∈ S 2 , i.e., states where a has shape y and all objects have the same temperature.The action that changes the shape of an object to its goal shape while not changing the temperature of an object reaches a subgoal state in G r 1 (s) in a single step.Now, consider states S 3 ⊆ S where the feature conditions of r 2 are true, there is no object scheduled and no machine occupied, and objects have their correct shape, i.e., states where there is an object a that has surface x that is not the surface y mentioned in the goal, there is no object scheduled and no occupied machine because this can be achieved with width 0 (see above), and all objects have their correct shape because changing the shape has width 0 (see above).With G r 2 (s) we denote the set of subgoal states of r 2 in s ∈ S 3 , i.e., states where a has surface y and all objects have the same shape and temperature.The action that changes the surface condition of an object to the goal surface condition while not changing a correct shape or the temperature of an object reaches a subgoal state in G r 2 (s) in a single step.Now, consider states S 4 ⊆ S where the feature conditions of r 3 are true, there is no object scheduled, no machine occupied, and all objects have their correct shape and surface, i.e., states where there is an object a that has color x that is not the color y mentioned in the goal, there is no object scheduled and no occupied machine because this can be achieved with width 0 (see above), all objects have their correct shape because changing the shape has width 0 (see above), and all objects have their desired surface because changing the surface has width 0 (see above).With G r 3 (s) we denote the set of subgoal states of r 3 in s ∈ S 3 , i.e., states where a has color y and all objects have the same shape, surface condition and temperature.The action that changes the color of an object to the goal color while not changing the shape, surface condition or the temperature of an object reaches a subgoal state in G r 3 (s) in a single step.Note that r 3 achieves the goal when the color of the last object changes to the color mentioned in the goal.
We obtain sketch width 0 because each subproblem is solvable in a single step.

Experiments
Even though the focus of our work is on proving polynomial runtime bounds for planning domains theoretically, we evaluate in this section how these runtime guarantees translate into practice.We implemented two versions of SIW R : one version, denoted by SIW G R , is based on the LAPKT planning system (Ramirez, Lipovetzky, & Muise, 2015) and grounds the input task to a propositional representation before the search, the other version, denoted by SIW L R , is implemented in the Mimir planning system (Ståhlberg, 2023) and searches on the lifted task representation directly.Both versions use the DLPlan library (Drexler, Francès, & Seipp, 2022) to represent and evaluate features (see the appendix for details).We use the Lab toolkit (Seipp, Pommerening, Sievers, & Helmert, 2017) for running experiments on Intel Xeon Gold 6130 CPU cores.For each planner run, we limit time and memory by 30 minutes and 3 GiB.
We collected benchmark tasks for the domains analyzed above from two different sources.The first source is the satisficing track of previous IPCs.Since many of these tasks are trivial for stateof-the-art planners, we also consider tasks from the new Autoscale benchmark set (Torralba, Seipp, & Sievers, 2021).The Autoscale benchmark set is optimized to contain tasks where current stateof-the-art planners show differences in coverage. 5The Grid * tasks in the Autoscale benchmark set are different from the Grid tasks in the IPC benchmark set.The difference is that in Grid * tasks, it is not always possible to reach every lock for the unlock operation.Hence, rule r 3 can define picking a wrong key, increasing the width of these subproblems to 2. However, a sketch of width 1 can be obtained by picking only keys for which there exists a locked door that is reachable, which can be computed by using transitive closure on the connectivity relation.As discussed in the section about the Grid domain, dropping a key at its target location by exchanging it with a key at the current location increases the width of a subproblem from 1 to 2. Therefore, for the domain Grid (resp.Grid * ), we also include a simplified domain Grid S (resp.Grid S * ) where we removed the action to exchange the key that is being held with a key at the current location.
The main question we want to answer empirically is how much an SIW search benefits from using policy sketches.To this end, we compare SIW(2) to SIW R (2), which uses the sketches presented above.We use a width bound of k = 2 because SIW(k) and SIW R (k) are too slow to compute in practice for larger values of k.We also include two well-known, state-of-the-art planners, LAMA (Richter & Westphal, 2010) and Dual-BFWS (Lipovetzky & Geffner, 2017a), to show that the considered planning tasks are hard to solve for the strongest domain-independent planners.However, since SIW R (2) is a domain-dependent planner, we cannot directly compare it to the domain-independent approaches.The code and data are available online (Drexler, Seipp, & Geffner, 2023a).
Table 1 shows results for the five planners.In addition to the number of solved tasks and planner runtimes, which we discuss below, for the planners based on SIW, the table also holds data about the effective width.The effective width for a problem P and one of SIW(k), SIW G R (k), or SIW L R (k) is the smallest natural number k the algorithm needs to solve P .The effective width can be smaller than the actual width of the problem and depends on the order in which a specific implementation of the SIW-based algorithms generates successor states.For more robust and comparable results, we always randomly shuffle the applicable actions of a state before generating its successor states.Since an SIW R (k) search splits a problem into subproblems, we further distinguish between the maximum effective width (M) among all subproblems and the average effective width (A) over all subproblems.We see that the maximum effective width for SIW R (2) equals the theoretical upper bounds established in the previous section, suggesting that the bounds are indeed tight.We can see that the maximum effective width of SIW L R (2) in Grid (2) is larger than the maximum effective width of the same algorithm in Grid S (1).In one subproblem in Grid, the exchange key action is applied before the drop key action, resulting in the pruning of the actual subgoal state where the key 5.The Autoscale benchmark set does not contain tasks from the Schedule domain.2).The table shows the number of solved tasks (S), the maximum runtime (T) in seconds for tasks commonly solved by LAMA, Dual-BFWS, SIW G R (2), and SIW L R (2), and the average (A) and maximum (M) effective width over all encountered subtasks.The top and bottom parts show results for the IPC and Autoscale benchmark sets, respectively.We highlight the maximum number of solved tasks (S) per domain in boldface.
is at its target location.Inspecting the average effective width for SIW R (2), we see that the value is always closer to 1 than to 2 for the domains with sketch width 2. The sketch for the Schedule domain is a general policy where every subproblem is solved in a single step.
The original SIW(2) planner (without sketches) solves very few tasks across the board.In both the IPC and the Autoscale set, there are four domains where SIW(2) solves at most a single task.These results confirm that in many domains, the problem width is too large for plain SIW.In contrast, the sketches allow SIW R (2) to solve all IPC tasks with an exception in Schedule where SIW L R (2) does not support quantified preconditions that are used in the domain description.
On the harder Autoscale benchmark set, we observe that SIW G R (2) solves three domains completely (Barman, Childsnack, Driverlog), has a coverage similar to SIW L R in Floortile, and a lower coverage than LAMA in Grid * .There are two reasons why SIW G R (2) solves fewer Grid * tasks than LAMA, both related to memory usage.First, SIW G R (2) runs out of memory while trying to initialize the novelty table for width 2 in six tasks.Second, SIW G R (2), which is implemented in LAPKT, fails to ground all other remaining tasks because it uses more memory for representing the ground propositional task compared to LAMA.In TPP, SIW G R (2) solves all tasks where there is sufficient memory to compute the ground tasks and fails to ground all other remaining tasks.In Floortile, SIW G R (2) fails because the search requires too much time.
Our lifted planner SIW L R runs out of memory in only the four most difficult Grid * and Grid S * tasks and runs out of time in all other unsolved tasks.It significantly outperforms all other planners in TPP, solving the most difficult task in only 53 seconds.In Grid * , the successor generation is much slower compared to the grounded version SIW G R .SIW L R typically uses much less memory compared to SIW G R because the number of reached atoms grows dynamically during search and is usually much smaller compared to the number of ground atoms.
Overall, our results show that our sketch rules capture useful information and that adding this domain-specific knowledge to a width-based planner allows it to solve whole problem domains very efficiently.

Related Work
We first review other approaches for expressing domain control knowledge for planning and then discuss some related work on polynomial planning algorithms and subgoal decomposition for domainindependent classical planning.The distinction between actions that are "good" or "bad" in a fixed tractable domain can often be characterized explicitly.Indeed, general policies, unlike sketches, can provide such a classification of all possible state transitions (s, s ) over the problems in Q. Doing so, they ensure that the goal can always be reached by following any good transition (Bonet & Geffner, 2018;Bonet et al., 2019;Francès et al., 2021).Francès, Corrêa, Geissmann, and Pommerening (2019) use the same type of description logic features (Baader et al., 2003) to define and learn general policies in terms of linear value functions.Sketch rules have the same syntax as policy rules, but instead of constraining state transitions, they define subgoals.
Logical approaches to domain control have been used to provide partial information about good and bad state transitions in terms of suitable formulas (Bacchus & Kabanza, 2000;Kvarnström & Doherty, 2000).For example, for the Schedule domain, one may have a formula in linear temporal logic (LTL) expressing that objects that need to be lathed and painted should not be painted in the next time step, since lathing removes the paint.This partial information about good and bad transitions can then be used by a forward-state search planner to heavily prune the state space.A key difference between these formulas and sketches is that sketch rules are not about state transitions but about subgoals, and hence they structure the search for plans in a different way, in certain cases ensuring a polynomial search.Baier, Fritz, Bienvenu, and McIlraith (2008) combine control knowledge and preference formulas to improve search behavior and obtain plans of high quality, according to user preferences.The control knowledge is given in the Golog language (Levesque, Reiter, Lespérance, Lin, & Scherl, 1997) and defines subgoals such that a planner has to fill in the missing parts.Since the control knowledge is compiled directly to PDDL, they are able to leverage off-the-shelve planners.The user preferences are encoded in an LTL-like language.Like our policy sketches, their approach can be applied to any domain.However, policy sketches aim at ensuring polynomial searches in tractable domains.
Hierarchical task networks or HTNs are used mainly for expressing general top-down strategies for solving classes of planning problems (Erol, Hendler, & Nau, 1994;Nau, Au, Ilghami, Kuter, Murdock, Wu, & Yaman, 2003;Georgievski & Aiello, 2015).The domain knowledge is normally expressed in terms of a hierarchy of methods that have to be decomposed into primitive methods that cannot be decomposed any further.While the solution strategy expressed in HTNs does not have to be complete, it is often close to complete, as otherwise the search for decompositions easily becomes intractable.For this reason, crafting good and effective HTNs encodings is not easy.For example, the HTN formulation of the Barman domain in the 2020 Hierarchical Planning Competition (Höller, Behnke, Bercher, Biundo, Fiorino, Pellier, & Alford, 2019) contains 10 high-level tasks (like AchieveContainsShakerIngredient), 11 primitive tasks (like clean-shot) and 22 methods (like MakeAndPourCocktail).In contrast, the PDDL version of Barman has only 12 action schemas, and the sketch above has 4 rules over 4 linear features.Note, however, that comparing different forms of control knowledge in terms of their compactness is not well-defined.
Hierarchical Goal Networks, HGNs, are a hybrid approach between classical planning and HTNs (Shivashankar, Kuter, Nau, & Alford, 2012).So-called HGN methods are similar to actions in classical planning, but with an additional set of subgoals and a goal network that encodes a partially ordered sequence of goals.HGN methods are an alternative way to define PDDL actions, while sketches work directly on top of the PDDL planning formalism.HGNs, unlike HTNs, use a planning mechanism where ground methods are selected based on the current goals and state of the system, similar to sketches.
The need to represent the common subgoal structure of dynamic domains arises also in reinforcement learning (RL), where knowledge gained in the solution of some domain instances can be applied to speed up the learning of solutions to new instances of the same family of tasks (Finn, Abbeel, & Levine, 2017).In recent work in deep RL (DRL) these representations, in the form of general intrinsic reward functions (Singh, Lewis, Barto, & Sorg, 2010), are expected to be learned from suitable DRL architectures (Zheng, Oh, Hessel, Xu, Kroiss, van Hasselt, Silver, & Singh, 2020).Sketches provide a convenient high-level alternative to describe common subgoal structures, but opposed to the related work in DRL, the policy sketches above are not learned but are written by hand.We describe the challenge of automatically learning sketches briefly below.
Temporal abstraction is another method from reinforcement learning that addresses the problem of reward sparsity by decomposing tasks (Sutton, Precup, & Singh, 1999).Temporal abstractions consider a set of high-level macro actions in the form of options.Each option consists of a dedicated policy, reward function and termination criterion.In the options framework, an RL agent chooses one of its options based on its current state and executes the option's policy until it terminates.The policy learning happens at two levels: each option policy is learned individually on the low level and the high-level controller learns which option to select in which state.Recently, there have been several works on defining symbolic options, allowing the RL agent to use reasoning instead of learning for finding (partially-ordered) plans over the set of options (Illanes, Yan, Icarte, & McIlraith, 2020;Lee, Katz, Agravante, Liu, Klinger, Campbell, Sohrabi, & Tesauro, 2021;Jin, Ma, Jin, Zhuo, Chen, & Yu, 2022).These approaches are very similar in spirit to policy sketches and future research could even define options based on sketch rules.
Approaches based on temporal abstraction, such as the options framework or angelic hierarchical planning (Marthi, Russell, & Wolfe, 2008), use high-level actions to abstract away primitive actions, thereby reducing the size of the state space.Another way to simplify the state space is to use state abstraction, where multiple states are grouped into a single abstract state (e.g., Holte, Perez, Zimmer, & MacDonald, 1996).Policy sketches combine both types of abstraction: they use state abstraction by considering the feature valuation space and they use temporal abstraction since the subgoals are usually several steps away.In contrast to sketches, general policies only use state abstraction, but not temporal abstraction, because they operate directly on primitive actions.
Another approach for decomposing reinforcement learning tasks are reward machines (Icarte, Klassen, Valenzano, & McIlraith, 2022).A reward machine is a finite state machine that represents a coarse version of the underlying RL task.Each transition is labeled with a conjunction over a set of propositions.To illustrate the concept, assume that we have a reward machine with two states s and s and a transition from s to s labeled with conjunction c.When the reward machine is in state s and the RL agent observes a situation where c holds, the reward machine transitions into state s and yields a reward function that is deemed useful for the agent in the new subproblem captured by s .Even with policy sketches that only use Boolean features it is straightforward to capture the task decomposition of any reward machine.The main difference between the two approaches is that reward machines are defined solely via the transition labels and they consider their states as black boxes, whereas the rule conditions and effects for policy sketches are observable, i.e., amenable for reasoning by the planning algorithm that uses the sketch.Policy sketches are more general than reward machines since they can also use numerical features, allowing them to reason about quantitative change between states in addition to qualitative differences.Hoffmann (2005) analyzes the local search topology of the optimal delete-relaxed heuristic h + and shows that enforced hill climbing using h + runs in polynomial time for many IPC domains.Since the FF heuristic h FF (Hoffmann & Nebel, 2001) often closely approximates h + , his findings explain the strong performance of the FF planner, which uses enforced hill climbing with h FF (followed by a greedy best-first search using h FF ).Enforced hill climbing repeatedly runs a breadth-first search to find the next state with a lower heuristic value, which is similar to the breadth-first searches done by SIW and SIW R .A difference is that in the former case the breadth-first search is exponential in the search depth, while in the latter case it is exponential only in the width.Usually, the width is much smaller than the search depth required to escape a heuristic plateau or local minimum.Seipp, Pommerening, Röger, and Helmert (2016) point to shortcomings of the notion of width in planning domains with conjunctive goals, and introduce the correlation complexity measure that is given by the maximum size of the Boolean features needed in linear heuristic functions, called potential heuristics, to lead greedily to the goal.The Boolean features in that setting are conjunctions of facts in the planning problem.The authors show that many domains have a bounded and small correlation complexity, which however, unlike the notion of width, does not bound the complexity of the instances.
Subgoals have also been studied in the context of domain-independent planning.Porteous, Sebastia, and Hoffmann (2001) introduce landmarks as a method for decomposing problems into subproblems and use them within an incomplete hierarchical search algorithm.A way to use landmarks efficiently within a complete search algorithm was developed in the LAMA planner (Richter & Westphal, 2010) that runs a greedy best-first search with multiple queues, some ordered by goaldistance estimation heuristics like h FF and others by a landmark counting heuristic.

Conclusions and Future Work
We have shown that the language of policy sketches as introduced by Bonet and Geffner provides a simple, elegant, and powerful way for expressing the common subgoal structure of many planning domains.The SIW R algorithm can then solve these domains effectively, in provable polynomial time, where SIW fails either because the problems are not easily serializable in terms of the top goals or because some of the resulting subproblems have a high width.A big advantage of pure width-based algorithms like SIW and SIW R is that unlike other planning-based methods they can be used to plan with simulators in which the structure of states is available but the structure of actions is not. 6hile all sketches in this paper are designed by hand, we have shown in follow-up work to the original conference paper that it is possible to learn sketches automatically (Drexler et al., 2022).Our method for learning policy sketches is related to the method for learning general policies by Francès et al. (2021) which uses the state language (primitive PDDL predicates) to define a large pool of Boolean and numerical features via a description logic grammar (Baader et al., 2003).As shown in the appendix, all features used in the sketches above can be generated in this way.A longer-term challenge is to learn the sketches automatically when using the same inputs as DRL algorithms, where there is no state representation language.Recent works that learn first-order symbolic languages from black-box states or from states represented by images (Asai, 2019;Bonet & Geffner, 2020a;Rodriguez, Bonet, Romero, & Geffner, 2021) are important steps in that direction.
Furthermore, for each primitive concept C and primitive role R we allow for corresponding goal versions denoted by C g and R g that are evaluated in the goal of the planning instance instead of the state s, as described by Francès et al. (2021).

A.2 From Concepts and Roles to Features
We define Boolean and numerical features with an additional level of composition as follows.Consider concepts C, D, roles R, S, T , and X being either a role or a concept.The set of possible Boolean and numerical features for each state s are defined as: • Empty(X) is a Boolean feature that evaluates to true iff |X s | = 0.
• Count(X) is a numerical feature that evaluates to |X s |.
• ConceptDist(C, R, D) is a numerical feature that evaluates to the smallest n ∈ N 0 s.t.there are objects x 0 , . . ., x n with x 0 ∈ C s , x n ∈ D s , and (x i , x i+1 ) ∈ R s for i = 0, . . ., n − 1.If no such n exists then the feature evaluates to ∞.
• RoleDist(R, S, T ) is a numerical feature that evaluates to the smallest n ∈ N 0 s.t.there are objects x 0 , . . ., x n , there exists some (a, x 0 ) ∈ R s , (a, x n ) ∈ T s , and (x i , x i+1 ) ∈ S s for i = 0, . . ., n − 1.If no such n exists, the feature evaluates to ∞.
• SumRoleDist(R, S, T ) is a numerical feature that evaluates to r∈R s RoleDist({r}, S, T ), where the sum evaluates to ∞ if any term is ∞.
Consider the following concepts and roles: x 1 ≡ (painted g [0, 1] \ painted [0, 1])[0] x 2 ≡ (left[0] left[1]) \ painted g [0]) Concept x 1 is the set of all unpainted tiles.Concept x 2 is the set of all normal tiles that must not be painted.Role x 3 is the set of pairs of tiles (t, t ) where t is above or below of t and the identity t = t .Role x 4 is the set of pairs of tiles (t, t ) where t is above or below of t and both are unpainted.Role x 5 is the set of pairs of tiles (t, t ) where t is unpainted and above or below of normal tile t .
Consider the following concepts and roles: x 1 ≡ (stored g [0, 1] \ stored [0, 1])[0] Concept x 1 is the set of goods of which some quantity remains to be stored.Concept Consider the following concepts and roles: x 1 ≡ (contains g [0, 1] \ contains[0, 1]) x 2 ≡ (contains g [0, 1] contains[0, 1]) x 3 ≡ ∃cocktail -part1 [0, 1].∃contains [1,0].shaker-level [0] x 4 ≡ ∃cocktail -part2 [0, 1].∃contains [1,0].shaker-level [0] Role x 1 is the set of unserved beverages paired with the corresponding shot that must be used.Role x 2 is the set of served beverages paired with the corresponding shot that was used.Concept x 3 is the set of cocktails where the first ingredient mentioned in its recipe is in the shaker.Concept x 4 is the set of cocktails where the second ingredient mentioned in its recipe is in the shaker.The features Φ = {g, u, c 1 , c 2 } used in the sketch for Barman are constructed as follows: c 2 ≡ ¬Empty(x 3 x 4 x 1 [1])
Consider the following concepts and roles: x 2 ≡ no-gluten-sandwich[0] Concept x 1 is the set of unserved children.Concept x 2 is the set of gluten-free sandwiches.The features Φ = {c g , c r , s k g , s k , s t g , s t } used in the sketch for Childsnack are constructed as follows:
The features Φ = {p 1 , p 2 , p 3 , h, o} used in the sketch for Schedule are constructed as follows:

3.
Consider marked numerical feature n and marked Boolean feature p. Then there does not exist an SCC that contains Boolean feature valuations b, b where n = 0 ∈ b, n > 0 ∈ b , or p ∈ b, ¬p ∈ b , or ¬p ∈ b, p ∈ b because those change only finitely many times.The eliminated rule r = C → E has condition either n = 0, n > 0, ¬p or p and has only edges in one SCC C 1 that decrease a numerical feature m or change a Boolean feature q.The rules r that increase m or change q in the opposite direction have complementary conditions and therefore, have edges in other SCCs C 2 .Thus, SIEVE removes all edges with label r because r decreases m and no other rule r increases m in C 1 or r changes p and no other rule r changes p in the opposite direction in C 1 .

x 2
≡ ∃key-shape[0, 1].∃lock -shape[1, 0].locked [0] Role x 1 is the set of misplaced key paired with their respective target location.Concept x 2 is the set of keys for which a closed lock with the same shape as the key exists.The features Φ = {l , k , o, t} used in the sketch for Grid are constructed as follows:l ≡ Count(locked [0]) k ≡ Count(x 1 ) o ≡ ¬Empty(holding[0] x 2 ) t ≡ ¬Empty(holding[0] x 1 [0])