Pac-Learning Recursive Logic Programs: Efficient Algorithms

We present algorithms that learn certain classes of function-free recursive logic programs in polynomial time from equivalence queries. In particular, we show that a single k-ary recursive constant-depth determinate clause is learnable. Two-clause programs consisting of one learnable recursive clause and one constant-depth determinate non-recursive clause are also learnable, if an additional ``basecase'' oracle is assumed. These results immediately imply the pac-learnability of these classes. Although these classes of learnable recursive programs are very constrained, it is shown in a companion paper that they are maximally general, in that generalizing either class in any natural way leads to a computationally difficult learning problem. Thus, taken together with its companion paper, this paper establishes a boundary of efficient learnability for recursive logic programs.


Introduction
One active area of research in machine learning is learning concepts expressed in rstorder logic.Since most researchers have used some variant of Prolog to represent learned concepts, this subarea is sometimes called inductive logic programming (ILP) (Muggleton, 1992;Muggleton & De Raedt, 1994).
Within ILP, researchers have considered two broad classes of learning problems.The rst class of problems, which we will call here logic based relational learning problems, are rst-order variants of the sorts of classi cation problems typically considered within AI machine learning community: prototypical examples include Muggleton et al.'s (1992) formulation of -helix prediction, King et al.'s (1992) formulation of predicting drug activity, and Zelle and Mooney's (1994) use of ILP techniques to learn control heuristics for deterministic parsers.Logic-based relational learning often involves noisy examples that reect a relatively complex underlying relationship; it is a natural extension of propositional machine learning, and has already enjoyed a number of experimental successes.
In the second class of problems studied by ILP researchers, the target concept is a Prolog program that implements some common list-processing or arithmetic function; prototypical problems from this class might be learning to append two lists, or to multiply two numbers.These learning problems are similar in character to those studied in the area of automatic programming from examples (Summers, 1977;Biermann, 1978), and hence might be appropriately called automatic logic programming problems.Automatic logic programming problems are characterized by noise-free training data and recursive target concepts.Thus a problem that is central to the enterprise of automatic logic programming|but not, perhaps, logic-based relational learning|is the problem of learning recursive logic programs.
The goal of this paper is to formally analyze the learnability of recursive logic programs in Valiant's (1984) model of pac-learnability, thus hopefully shedding some light on the task of automatic logic programming.To summarize our results, we will show that some simple recursive programs are pac-learnable from examples alone, or from examples plus a small number of additional \hints".The largest learnable class we identify in a standard learning model is the class of one-clause constant-depth determinate programs with at most a constant number of \closed" recursive literals.The largest learnable class we identify that requires extra \hints" is the class of constant-depth determinate programs consisting of a single nonrecursive base clause and a single recursive clause from the class described above.All of our results are proved in the model of identi cation from equivalence queries (Angluin, 1988(Angluin, , 1989)), which is somewhat stronger than pac-learnability.Identi cation from equivalence queries requires that the target concept be exactly identi ed, in polynomial time, and using only a polynomial number of equivalence queries.An equivalence query asks if a hypothesis program H is equivalent to the target program C; the answer to a query is either \yes" or an adversarily chosen example on which H and C di er.This model of learnability is arguably more appropriate for automatic logic programming tasks than the weaker model of pac-learnability, as it is unclear how often an approximately correct recursive program will be useful.
Interestingly, the learning algorithms analyzed are di erent from most existing ILP learning methods; they all employ an unusual method of generalizing examples called forced simulation.Forced simulation is a simple and analytically tractable alternative to other methods for generalizing recursive programs against examples, such as n-th root nding (Muggleton, 1994), sub-uni cation (Aha, Lapointe, Ling, & Matwin, 1994) and recursive anti-uni cation (Idestam-Almquist, 1993), but it has been only rarely used in experimental ILP systems (Ling, 1991).
The paper is organized as follows.After presenting some preliminary de nitions, we begin by presenting (primarily for pedagogical reasons) a procedure for identifying from equivalence queries a single non-recursive constant-depth determinate clause.Then, in Section 4, we extend this learning algorithm, and the corresponding proof of correctness, to a simple class of recursive clauses: the class of \closed" linear recursive constant-depth determinate clauses.In Section 5, we relax some assumptions made to make the analysis easier, and present several extensions to this algorithm: we extend the algorithm from linear recursion to k-ary recursion, and also show how a k-ary recursive clause and a non-recursive clause can be learned simultaneously given an additional \basecase" oracle.We then discuss related work and conclude.
Although the learnable class of programs is large enough to include some well-known automatic logic programming benchmarks, it is extremely restricted.In a companion paper (Cohen, 1995), we provide a number of negative results, showing that relaxing any of these restrictions leads to di cult learning problems: in particular, learning problems that are either as hard as learning DNF (an open problem in computational learning theory), or as hard as cracking certain presumably secure cryptographic schemes.Thus, taken together with the results of the companion paper, our results delineate a boundary of learnability for recursive logic programs.
Although the two papers are independent, we suggest that readers wishing to read both this paper and the companion paper read this paper rst.

Background
In this section we will present the technical background necessary to state our results.We will assume, however, that the reader is familiar with the basic elements of logic programming; readers without this background are referred to one of the standard texts, for example (Lloyd, 1987).

Logic Programs
Our treatment of logic programs is standard, except that we will usually consider the body of a clause to be an ordered set of literals.
For most of this paper, we will consider logic programs without function symbols| i.e., programs written in Datalog. 1 The purpose of such a logic program is to answer certain questions relative to a database, DB, which is a set of ground atomic facts.(When convenient, we will also think of DB as a conjunction of ground unit clauses.)The simplest use of a Datalog program is to check the status of a simple instance.A simple instance (for a program P and a database DB) is a fact f.The pair (P; DB) is said to cover f i DB ^P `f.The set of simple instances covered by (P; DB) is precisely the minimal model of the logic program P ^DB.
In this paper, we will primarily consider extended instances which consist of two parts: an instance fact f, which is simply a ground fact, and a description D, which is a nite set of ground unit clauses.An extended instance e = (f; D) is covered by (P; DB) i DB ^D ^P `f If extended instances are allowed, then function-free programs are expressive enough to encode surprisingly interesting programs.In particular, many programs that are usually written with function symbols can be re-written as function-free programs, as the example below illustrates.
We note that using extended instances as examples is closely related to using ground clauses entailed by the target clause as examples: speci cally, the instance e = (f; D) is covered by P; DB i P ^DB `(f D).As the example above shows, there is also a close relationship between extended instances and literals with function symbols that have been removed by \ attening" (Rouveirol, 1994;De Raedt & D zeroski, 1994).We have elected to use Datalog programs and the model of extended instances in this paper for several reasons.Datalog is relatively easy to analyze.There is a close connection between Datalog and the restrictions imposed by certain practical learning systems, such FOIL (Quinlan, 1990;Quinlan & Cameron-Jones, 1993), FOCL (Pazzani & Kibler, 1992), and GOLEM (Muggleton & Feng, 1992).
Finally, using extended instances addresses the following technical problem.The learning problems considered in this paper involve restricted classes of logic programs.Often, the restrictions imply that the number of simple instances is polynomial; we note that with only a polynomial-size domain, questions about pac-learnability are usually trivial.Requiring learning algorithms to work over the domain of extended instances precludes trivial learning techniques, however, as the number of extended instances of size n is exponential in n even for highly restricted programs.

Restrictions on Logic Programs
In this paper, we will consider the learnability of various restricted classes of logic programs.Below we will de ne some of these restrictions; however, we will rst introduce some terminology.
If A B 1 ^: : :^B r is an (ordered) de nite clause, then the input variables of the literal B i are those variables appearing in B i which also appear in the clause A B 1 ^: : : ^Bi 1 ; all other variables appearing in B i are called output variables.Also, if A B 1 ^: : :^B r is a de nite clause, then B i is said to be a recursive literal if it has the same predicate symbol and arity as A, the head of the clause.

Types of Recursion
The rst set of restrictions concern the type of recursion that is allowed in a program.If every clause in a program has at most one recursive literal, then the program is linear recursive.If every clause in a program has at most k recursive literals, then the program is k-ary recursive.Finally, if every recursive literal in a program contains no output variables, then we will say that the program is closed recursive.

Determinacy and Depth
The second set of restrictions are variants of restrictions originally introduced by Muggleton and Feng (1992).If A B 1 ^: : : ^Br is an (ordered) de nite clause, the literal B i is determinate i for every possible substitution that uni es A with some fact e such that DB `B1 ^: : : ^Bi 1 there is at most one maximal substitution so that DB `Bi .A clause is determinate if all of its literals are determinate.Informally, determinate clauses are those that can be evaluated without backtracking by a Prolog interpreter.
We also de ne the depth of a variable appearing in a clause A B 1 ^: : :^B r as follows.
Variables appearing in the head of a clause have depth zero.Otherwise, let B i be the rst literal containing the variable V , and let d be the maximal depth of the input variables of B i ; then the depth of V is d+1.The depth of a clause is the maximal depth of any variable in the clause.
Muggleton and Feng de ne a logic program to be ij-determinate if it is is determinate, of constant depth i, and contains literals of arity j or less.In this paper we use the phrase \constant-depth determinate" instead to denote this class of programs.Below are some examples of constant-depth determinate programs, taken from D zeroski, Muggleton and Russell (1992).
Example.Assuming successor is functional, the following program is determinate.The maximum depth of a variable is one, for the variable C in the second clause, and hence the program is of depth one.The program GOLEM (Muggleton & Feng, 1992) learns constant-depth determinate programs, and related restrictions have been adopted by several other practical learning systems (Quinlan, 1991;Lavra c & D zeroski, 1992;Cohen, 1993c).The learnability of constant-depth determinate clauses has also received some formal study, which we will review in Section 6.

Mode Constraints and Declarations
We de ne the mode of a literal L appearing in a clause C to be a string s such that the initial character of s is the predicate symbol of L, and for j > 1 the j-th character of s is a \+" if the (j 1)-th argument of L is an input variable and a \ " if the (j 1)-th argument of L is an output variable.(This de nition coincides with the usual de nition of Prolog modes only when all arguments to the head of a clause are inputs.This simpli cation is justi ed, however, as we are considering only how clauses behave in classifying extended instances, which are ground.)A mode constraint is simply a set of mode strings R = fs 1 ; : : :; s k g, and a clause C is said to satisfy a mode constraint R for p if for every literal L in the body of C, the mode of L is in R.
Example.In the following append program, every literal has been annotated with its mode.append (Xs,Ys,Ys) null(Xs).
% mode: null+ Mode constraints are commonly used in analyzing Prolog code; for instance, they are used in many Prolog compilers.We will sometimes use an alternative syntax for mode constraints that parallels the syntax used in most Prolog systems: for instance, we may write the mode constraint \components + " as \components(+; ; )".
We de ne a declaration to be a tuple (p; a 0 ; R) where p is a predicate symbol, a 0 is an integer, and R is a mode constraint.We will say that a clause C satis es a declaration if the head of C has arity a 0 and predicate symbol p, and if for every literal L in the body of C the mode of L appears in R.

A Model of Learnability
In this section, we will present our model of learnability.We will rst review the necessary de nitions for a standard learning model, the model of learning from equivalence queries (Angluin, 1988(Angluin, , 1989)), and discuss its relationship to other learning models.We will then introduce an extension to this model which is necessary for analyzing ILP problems.

Identification From Equivalence Queries
Let X be a set.We will call X the domain, and call the elements of X instances.De ne a concept C over X to be a representation of some subset of X, and de ne a language Lang to be a set of concepts.In this paper, we will be rather casual about the distinction between a concept and the set it represents; when there is a risk of confusion we will refer to the set represented by a concept C as the extension of C. Two concepts C 1 and C 2 with the same extension are said to be (semantically) equivalent.
Associated with X and Lang are two size complexity measures, for which we will use the following notation: The size complexity of a concept C 2 Lang is written j jCj j.The size complexity of an instance e 2 X is written j jej j.If S is a set, S n stands for the set of all elements of S of size complexity no greater than n.For instance, X n = fe 2 X : j jej j ng and Lang n = fC 2 Lang : j jCj j ng.
We will assume that all size measures are polynomially related to the number of bits needed to represent C or e.
The rst learning model that we consider is the model of identi cation with equivalence queries.The goal of the learner is to identify some unknown target concept C 2 Lang| that is, to construct some hypothesis H 2 Lang such that H C. Information about the target concept is gathered only through equivalence queries.The input to an equivalence query for C is some hypothesis H 2 Lang.If H C, then the response to the query is \yes".Otherwise, the response to the query is an arbitrarily chosen counterexample|an instance e that is in the symmetric di erence of C and H.
A deterministic algorithm Identify identi es Lang from equivalence queries i for every C 2 Lang, whenever Identify is run (with an oracle answering equivalence queries for C) it eventually halts and outputs some H 2 Lang such that H C. Identify polynomially identi es Lang from equivalence queries i there is a polynomial poly(n t ; n e ) such that at any point in the execution of Identify the total running time is bounded by poly(n t ; n e ), where n t = j jCj j and n e is the size of the largest counterexample seen so far, or 0 if no equivalence queries have been made.

Relation to Pac-Learnability
The model of identi cation from equivalence queries has been well-studied (Angluin, 1988(Angluin, , 1989)).It is known that if a language is learnable in this model, then it is also learnable in Valiant's (1984) model of pac-learnability.(The basic idea behind this result is that an equivalence query for the hypothesis H can be emulated by drawing a set of random examples of a certain size.If any of them is a counterexample to H, then one returns the found counterexample as the answer to the equivalence query.If no counterexamples are found, one can assume with high con dence that H is approximately equivalent to the target concept.)Thus identi cation from equivalence queries is a strictly stronger model than pac-learnability.
Most existing positive results on the pac-learnability of logic programs rely on showing that every concept in the target language can be emulated by a boolean concept from some pac-learnable class (D zeroski et al., 1992;Cohen, 1994).While such results can be illuminating, they are also disappointing, since one of the motivations for considering rstorder representations in the rst place is that they allow one to express concepts that cannot be easily expressed in boolean logic.One advantage of studying the exact identi cation model and considering recursive programs is that it essentially precludes use of this sort of proof technique: while many recursive programs can be approximated by boolean functions over a xed set of attributes, few can be be exactly emulated by boolean functions.

Background Knowledge in Learning
The framework described above is standard, and is one possible formalization of the usual situation in inductive concept learning, in which a user provides a set of examples (in this case counterexamples to queries) and the learning system attempts to nd a useful hypothesis.However, in a typical ILP system, the setting is slightly di erent, as usually the user provides clues about the target concept in addition to the examples.In most ILP systems the user provides a database DB of \background knowledge" in addition to a set of examples; in this paper, we will assume that the user also provides a declaration.To account for these additional inputs it is necessary to extend the framework described above to a setting where the learner accepts inputs other than training examples.
To formalize this, we introduce the following notion of a \language family".If Lang is a set of clauses, DB is a database and Dec is a declaration, we will de ne Lang DB; Dec] to be the set of all pairs (C; DB) such that C 2 Lang and C satis es Dec. Semantically, such a pair will denote the set of all extended instances (f; D) covered by (C; DB).Next, if DB is a set of databases and DEC is a set of declarations, then de ne Lang DB; DEC] = fLang DB; Dec] : DB 2 DB and Dec 2 DECg This set of languages is called a language family.
We will now extend the de nition of identi cation from equivalence queries to language families as follows.A language family Lang DB; DEC] is identi able from equivalence queries i every language in the set is identi able from equivalence queries.A language family Lang DB; DEC] is uniformly identi able from equivalence queries i there is a single algorithm Identify(DB; Dec) that identi es any language Lang DB; Dec] in the family given DB and Dec. Uniform polynomial identi ability of a language family is de ned analogously: Lang DB; DEC] is uniformly polynomially identi able from equivalence queries i there is a polynomial time algorithm Identify(DB; Dec) that identi es any language Lang DB; Dec] in the family given DB and Dec.Note that Identify must run in time polynomial in the size of the inputs Dec and DB as well as the target concept.

Restricted Types of Background Knowledge
We will now describe a number of restricted classes of databases and declarations.
One restriction which we will make throughout this paper is to assume that all of the predicates of interest are of bounded arity.We will use the notation a-DB for the set of all databases that contain only facts of arity a or less, and the notation a-DEC for the set of all declarations (p; a 0 ; R) such that every string s 2 R is of length a + 1 or less.
For technical reasons, it will often be convenient to assume that a database contains an equality predicate|that is, a predicate symbol equal such that equal(t i ; t i ) 2 DB for every constant t i appearing in DB, and equal(t i ; t j ) 6 2 DB for any t i 6 = t j .Similarly, we will often wish to assume that a declaration allows literals of the form equal(X,Y), where X and Y are input variables.If DB (respectively DEC) is any set of databases (declarations) we will use DB = (DEC = ) to denote the corresponding set, with the additional restriction that the database (declaration) must contain an equality predicate (respectively the mode equal(+; +)).
It will sometimes also be convenient to assume that a declaration (p; a 0 ; R) allows only a single valid mode for each predicate: i.e., that for each predicate q there is in R only a single mode constraint of the form q .Such a declaration will be called a unique-mode declaration.If DEC is any set of declarations we will use DEC 1 to denote the corresponding set of declarations with the additional restriction that the declaration is unique-mode.
Finally, we note that in a typical setting, the facts that appear in a database DB and descriptions D of extended instances are not arbitrary: instead, they are representative of some \real" predicate (e.g., the relationship of a list to its components in the example above).
One way of formalizing this is assume that all facts will be drawn from some restricted set F; using this assumption one can de ne the notion of a determinate mode.If f = p(t 1 ; : : :; t k ) is a fact with predicate symbol p and p is a mode, then de ne inputs(f; p ) to be the tuple ht i 1 ; : : :; t i k i, where i 1 , : : :, i k are the indices of containing a \+".Also de ne outputs(f; p ) to be the tuple ht j 1 ; : : :; t j l i, where j 1 , : : :, j l are the indices of containing a \ ".A mode string p for a predicate p is determinate for F i the relation fhinputs(f; p ); outputs(f; p )i : f 2 Fg is a function.Informally, a mode is determinate if the input positions of the facts in F functionally determine the output positions.
The set of all declarations containing only modes determinate for F will be denoted DetDEC F .However, in this paper, the set F will be assumed to be xed, and thus we will generally omit the subscript.
A program consistent with a determinate declaration Dec 2 DetDEC must be determinate, as de ned above; in other words, consistency with a determinate declaration is a su cient condition for semantic determinacy.It is also a condition that can be veri ed with a simple syntactic test.

Size Measures for Logic Programs
Assuming that all predicates are arity a or less for some constant a also allows very simple size measures to be used.In this paper, we will measure the size of a database DB by its cardinality; the size of an extended instance (f; D) by the cardinality of D; the size of a declaration (p; a 0 ; R) by the cardinality of R; and the size of a clause A B 1 ^: : : ^Br by the number of literals in its body.

Learning a Nonrecursive Clause
The learning algorithms presented in this paper all use a generalization technique which we call forced simulation.By way of an introduction to this technique, we will consider a learning algorithm for non-recursive constant-depth clauses.While this result is presented primarily for pedagogical reasons, it may be of interest on its own: it is independent of previous proofs of the pac-learnability of this class (D zeroski et al., 1992), and it is also somewhat more rigorous than previous proofs.
Although the details and analysis of the algorithm for non-recursive clauses are somewhat involved, the basic idea behind the algorithm is quite simple.First, a highlyspeci c \bottom clause" is constructed, using two operations that we call DEEPEN and CONSTRAIN.Second, this bottom clause is generalized by deleting literals so that it covers the positive examples: the algorithm for generalizing a clause to cover an example is (roughly) to simulate the clause on the example, and delete any literals that would cause the clause to fail.In the remainder of this section we will describe and analyze this learning algorithm in detail.

Constructing a \Bottom Clause"
Let Dec = (p; a 0 ; R) be a declaration and let A B 1 ^: : : ^Br be a de nite clause.We de ne DEEPEN Dec (A B 1 ^: : : ^Br ) A B 1 ^: : : ^Br ^( where L D is a maximal set of literals L i that satisfy the following conditions: the clause A B 1 ^: : : ^Br ^Li satis es the mode constraints given in R; if L i 2 L D has the same mode and predicate symbol as some other L j 2 L D , then the input variables of L i are di erent from the input variables of L j ; every L i has at least one output variable, and the output variables of L i are all di erent from each other, and are also di erence from the output variables of any other L j 2 L D . As an extension of this notation, we de ne DEEPEN i Dec (C) to be the result of applying the function DEEPEN Dec repeatedly i times to C, i.e., where L C is the set of all literals L i such that A B 1 ^: : : ^Br ^Li satis es the mode constraints given in R, and L i contains no output variables.
Example.Let D0 be the declaration (p; 2; R) where R contains the mode constraints mother(+; ), father(+; ), male(+), female(+), and equal(+; +).Then Let us say that clause C 1 is a subclause of clause C 2 if the heads of C 1 and C 2 are identical, if every literal in the body of C 1 also appears in C 2 , and if the literals in the body of C 1 appear in the same order as they do in C 2 .The functions DEEPEN and CONSTRAIN allow one to easily describe a clause with an interesting property.
Theorem 1 Let Dec = (p; a 0 ; R) be a declaration in a-DetDEC = , let X 1 ; : : :; X a 0 be distinct variables, and de ne the clause BOTTOM d as follows: BOTTOM d (Dec) CONSTRAIN Dec (DEEPEN d Dec (p(X 1 ; : : :; X a 0 ) )) For any constants d and a, the following are true: the size of BOTTOM d (Dec) is polynomial in j jDecj j; Proof: See Appendix A. A related result also appears in Muggleton and Feng (1992).
Example.Below C 1 and D 1 are equivalent, as are C 2 and D 2 .Notice that D 1 and D 2 are subclauses of BOTTOM 1 (D0).For C 1 and D 1 , p(X,Y) is true when X is Y 's brother.For C 2 and D 2 , p(X,Y) is true when X is Y 's daughter, and Y is X's father.

The Learning Algorithm
Theorem 1 suggests that it may be possible to learn non-recursive constant-depth determinate clauses by searching the space of subclauses of BOTTOM d in some e cient manner.Figures 1 and 2 present an algorithm called Force1 NR that does this when Dec is a unique-mode declaration.
Figure 1 presents the top-level learning algorithm, Force1 NR .Force1 NR takes as input a database DB and a declaration Dec, and begins by hypothesizing the clause BOTTOM d (Dec).After each positive counterexample e + , the current hypothesis is generalized as little as possible in order to cover e + .This strategy means that the hypothesis is begin subroutine ForceSim NR (H ; f ; Dec; DB): % \forcibly simulate" H on fact f if f 2 DB then return H elseif the head of H and f cannot be uni ed then return FAILURE else let H 0 H let be the mgu of f and the head of H 0 for each literal L in the body of H 0 do if there is a substitution 0 such that L 0 2 DB then 0 , where 0 is the most general such substitution else delete L from the body of H 0 , together with all literals L 0 supported (directly or indirectly) by L endif endfor return H 0 endif end Figure 2: Forced simulation for nonrecursive depth-d determinate clauses always the least general hypothesis that covers the positive examples; hence, if a negative counterexample e is ever seen, the algorithm will abort with a message that no consistent hypothesis exists.
To minimally generalize a hypothesis H, the function ForceSim NR is used.This subroutine is shown in Figure 2. In the gure, the following terminology is used.If some output variable of L is an input variable of L 0 , then we say that L directly supports L 0 .We will say that L supports L 0 i L directly supports L 0 , or if L directly supports some literal L 00 that supports L 0 .(Thus \supports" is the transitive closure of \directly supports".)ForceSim NR deletes from H the minimal number of literals necessary to let H cover e + .To do this, ForceSim NR simulates the action of a Prolog interpreter in evaluating H, except that whenever a literal L in the body of H would fail, that literal is deleted, along with all literals L 0 supported by L.
The idea of learning by repeated generalization is an old one; in particular, previous methods exist for learning a de nite clause by generalizing a highly-speci c one.For example, CLINT (De Raedt & Bruynooghe, 1992) generalizes a \starting clause" guided by queries made to the user; PROGOL (Srinivasan, Muggleton, King, & Sternberg, 1994) guides a top-down generalization process with a known bottom clause; and Rouveirol (1994) describes a method for generalizing bottom clauses created by saturation.The Force1 NR algorithm is thus of interest not for its novelty, but because it is provably correct and e cient, as noted in the theorem below.
In particular, let d-DepthNonRec be the language of nonrecursive clauses of depth d or less (and hence i-DepthNonRec DB; j-DetDEC] is the language of nonrecursive ijdeterminate clauses).We have the following result: Theorem 2 For any constants a and d, the language family d-DepthNonRec DB = ; a-DetDEC =1 ] is uniformly identi able from equivalence queries.
Proof: We will show that Force1 NR uniformly identi es this language family with a polynomial number of queries.We begin with the following important lemma, which characterizes the behavior of ForceSim NR .
Lemma 3 Let Dec declaration in DetDEC =1 , let DB be a database, let f be a fact, and let H be a determinate nonrecursive clause that satis es Dec. Then one of following conditions must hold: ForceSim NR (H ; f ; Dec; DB) returns FAILURE, and no subclause H 0 of H satis es both Dec and the constraint H 0 ^DB `f; or, ForceSim NR (H ; f ; Dec; DB) returns a clause H 0 , and H 0 is the unique syntactically largest subclause of H that satis es both Dec and the constraint H 0 ^DB `f.Proof of lemma: To avoid repetition, we will refer to the syntactically maximal subclauses H 0 of H that satisfy both Dec and the constraint H 0 ^DB `f as \admissible subclauses" in the proof below.
Clearly the lemma is true if H or FAILURE is returned by ForceSim NR .In the remaining cases the for loop of the algorithm is executed, and we must establish these two claims (under the assumptions that A and f unify, and that f 6 2 DB): Claim 1.If L is retained, then every admissible subclause contains L. Claim 2. If L is deleted, then no admissible subclause contains L.
First, however, observe that deleting a literal L may cause the mode of some other literals to violate the mode declarations of Dec.It is easy to see that if L is deleted from a clause C, then the mode of all literals L 0 directly supported by L will change.Thus if C satis es a unique-mode declaration prior to the deletion of L, then after the deletion of L all literals L 0 that are directly supported by L will have invalid modes.Now, to see that Claim 1 is true, suppose instead that it is false.Then there must be some maximal subclause C 0 of H that satis es Dec, covers the fact f, and does not contain L. By the argument above, if C 0 does not contain L but satis ed Dec, then C 0 contains no literals L 0 from H that are supported by L. Hence the output variables of L are disjoint from the variables appearing in C 0 .This means that if L were to be added to C 0 the resulting clause would still satisfy Dec and cover f, which leads to a contradiction since C 0 was assumed to be maximal.
To verify Claim 2, let us introduce the following terminology.If C = (A B 1 ^: : :^B r ) is a clause and DB is a database, we will say that the substitution is a (DB; f)-witness for C i is associated with a proof that C ^DB `f (or more precisely, i A = f and 8i : 1 i r; B i 2 DB.)We claim that the following condition is an invariant of the for loop of the ForceSim NR algorithm.
Invariant 1.Let C be any admissible subclause that contains all the literals in H 0 preceding L (i.e., that contains all those literals of H that were retained on previous iterations of the algorithm).Then every (DB; f)-witness for C is a superset of .
This can be easily established by induction on the number of iterations of the for loop.The condition is true when the loop is rst entered, since is initially the most general uni er of A and f.The condition remains true after an iteration in which L is deleted, since is unchanged.Finally, the condition remains true after an iteration in which L is retained: because 0 is maximally general, it may only assign values to the output variables of L, and by determinacy only one assignment to the output variables of L can make L true.Hence every (DB; f)-witness for C must contain the bindings in .
Next, with an inductive argument and Claim 1 one can show that every admissible subclause C must contain all the literals that have been retained in previous iterations of the loop, leading to the following strengthening of Invariant 1: Invariant 1 0 .Let C be any admissible subclause.Then every (DB; f)-witness for C is a superset of .Now, notice that only two types of literals are deleted: (a) literals L such that no superset of can make L true, and (b) literals L 0 that are supported by a literal L of the preceding type.In case (a), clearly L cannot be part of any admissible subclause, since no superset of makes L succeed, and only such supersets can be witnesses of admissible clauses.In case (b), again L 0 cannot be part of any admissible subclause, since its declaration is invalid unless L is present in the clause, and by the argument above L cannot be in the clause.
This concludes the proof of the lemma.
To prove the theorem, we must now establish the following properties of the identi cation algorithm.
Correctness.By Theorem 1, if the target program is in d-DepthNonRec DB; Dec], then there is some clause C T that is equivalent to the target, and is a subclause of BOTTOM d (Dec).H is initially BOTTOM d and hence a superclause of C T .Now consider invoking ForceSim NR on any positive counterexample e + .By Lemma 3, if this invocation is successful, H will be replaced by H 0 , the longest subclause of H that covers e + .Since C T is a subclause of H that covers e + , this means that H 0 will again be a superclause of C T .Inductively, then, the hypothesis is always a superclause of the target.
Further, since the counterexample e + is always an instance that is not covered by the current hypothesis H, every time the hypothesis is updated, the new hypothesis is a proper subclause of the old.This means that Force1 NR will eventually identify the target clause.
E ciency.The number of queries made is polynomial in j jDecj j and j jDBj j, since H is initially of size polynomial in j jDecj j, and is reduced in size each time a counterexample is provided.To see that each counterexample is processed in time polynomial in n r , n e , and n t , notice that since the length of H is polynomial, the number of repetitions of the for loop of ForceSim NR is also polynomial; further, since the arity of literals L is bounded by a, only an b + an e constants exist in DB D, and hence there are at most (an b + an e ) a substitutions 0 to check inside the for loop, which is again polynomial.Thus each execution of ForceSim NR requires only polynomial time.
This concludes the proof.

Learning a Linear Closed Recursive Clause
Recall that if a clause has only one recursive literal, then the clause is linear recursive, and that if no recursive literal contains output variables, then the clause is closed linear recursive.In this section, we will describe how the Force1 algorithm can be extended to learn a single linear closed recursive clause. 2 Before presenting the extension, however, we would rst like to discuss a reasonable-sounding approach that, on closer examination, turns out to be incorrect.

A Remark on Recursive Clauses
One plausible rst step toward extending Force1 to recursive clauses is to allow recursive literals in hypotheses, and treat them the same way as other literals|that is, to include recursive literals in the initial clause BOTTOM d , and delete these literals gradually as positives examples are received.A problem with this approach is that there is no simple way to check if a recursive literal in a clause succeeds or fails on a particular example.This makes it impossible to simply run ForceSim NR on clauses containing recursive literals.
A straightforward (apparent) solution to this problem is to assume that an oracle exists which can be queried as to the success or failure of any recursive literal.For closed recursive clauses, it is su cient to assume that there is an oracle MEMBER Ct (DB; f) that answers the question Does DB ^P `f ?where C t is the unknown target concept, f is a ground fact, and DB is a database.Given such an oracle, one can determine if a closed recursive literal L r should be retained by checking if MEMBER C T (DB; L r ) is true.Such an oracle is very close to the notion of a membership query as used in computational learning theory.This is a natural extension of the Force1 NR learning algorithm to recursive clauses|in fact an algorithm based on similar ideas has been been previously conjectured to pac-learn closed recursive constant-depth determinate clauses (D zeroski et al., 1992).Unfortunately, this algorithm can fail to return a clause that is consistent with a positive counterexample.To illustrate this, consider the following example.components(Xs,X,Xs1), components(Zs,Z,Zs1), X1=Z1, append(Xs1,Ys,Zs1).
We will assume also a database DB that de nes the predicate null to be true for empty lists, and odd to be true for the constants 1 and 3.
To see how the forced simulation can fail, consider the following positive instance e = (f; D): This is simply a \ attened" form of append( 1,2], 3], 1,2,3]), together with the appropriate base case append( ], 3], 3]).Now consider beginning with the clause BOTTOM 1 and generalizing it using ForceSim NR to cover this positive instance.This process is illustrated in Figure 3.The clause on the left in the gure is BOTTOM d (Dec); the clause on the right is the output of forcibly simulating this clause on f with ForceSim NR .(For clarity we've assumed that only the single correct recursive call remains after forced simulation.) The resulting clause is incorrect, in that it does not cover the given example e.
This can be easily seen by stepping through the actions of a Prolog interpreter with the generalized clause of Figure 3.The nonrecursive literals will all succeed, leading to the subgoal append(l2,l3,l23) (or in the usual Prolog notation, append( 2], 3], 2,3])).This subgoal will fail at the literal odd(X1), because X1 is bound to 2 for this subgoal, and the fact odd(2) is not true in DB D.
This example illustrates a pitfall in the policy of treating recursive and non-recursive literals in a uniform manner (For more discussion, see also (Bergadano & Gunetti, 1993;De Raedt, Lavra c, & D zeroski, 1993).)Unlike nonrecursive literals, the truth of the fact L r (corresponding to the recursive literal L r ) does not imply that a clause containing L r will succeed; it may be that while the rst subgoal L r succeeds, deeper subgoals fail.

Forced Simulation for Recursive Clauses
A solution to this problem is to replace the calls to the membership oracle in the algorithm sketched above with a call to a routine that forcibly simulates the actions of a top-down theorem-prover on a recursive clause.In particular, the following algorithm is suggested.First, build a nonrecursive \bottom clause", as was done in ForceSim NR .Second, nd some recursive literal L r such that appending L r to the bottom clause yields a recursive clause that can be generalized to cover the positive examples.
As in the nonrecursive case, a clause is generalized by deleting literals, using a straightforward generalization of the procedure for forced simulation of nonrecursive clauses.During forced simulation, any failing nonrecursive subgoals are simply deleted; however, when a recursive literal L r is encountered, one forcibly simulates the hypothesis clause recursively let A be the head of H 0 let be the mgu of A and e for each literal L in the body of H 0 do if there is a substitution 0 such that L 0 2 DB then 0 , where 0 is the most general such substitution else delete L from the body of H 0 , together with all literals L 0 supported (directly or indirectly) by L endif endfor % 5. generalize H 0 on the recursive subgoal L r if L r is ground then return ForceSim(H 0 fL r g; L r ; Dec; DB; h 1) else return FAILURE endif endif end The extended algorithm is similar to ForceSim NR , but di ers in that when the recursive literal L r is reached in the simulation of H, the corresponding subgoal L r is created, and the hypothesized clause is recursively forcibly simulated on this subgoal.This ensures that the generalized clause will also succeed on the subgoal.For reasons that will become clear shortly, we would like this algorithm to terminate, even if the original clause H enters an in nite loop when used in a top-down interpreter.In order to ensure termination, an extra argument h is passed to ForceSim.The argument h represents a depth bound for the forced simulation.
To summarize, the basic idea behind the algorithm of Figure 4 is to simulate the hypothesized clause H on f, and generalize H by deleting literals whenever H would fail on f or on any subgoal of f. Example.
Consider using ForceSim to forcibly simulate the following recursive clause Here the recursive literal L r is append(Xs1,Ys,Zs1).We will also assume that f is taken from the extended query e = (f; D), which is again the attened version of the instance append( 1,2], 3], 1,2,3]) used in the previous example; that Dec is the set of declarations of in the previous example; and that the database DB is D null(nul).
As in Section 3 we begin our analysis by showing the correctness of the forced simulation algorithm|i.e., by showing that forced simulation does indeed produce a unique maximally speci c generalization of the input clause that covers the example.
This proof of correctness uses induction on the depth of a proof.Let us introduce again some additional notation, and write P ^DB `h f if the Prolog program (P; DB) can be used to prove the fact f in a proof of depth h or less.(The notion of depth of a proof is the usual one; we will de ne looking up f in the database DB to be a proof of depth zero.)We have the following result concerning the ForceSim algorithm.
Theorem 4 Let Dec be a declaration in DetDEC =1 , let DB be a database, let f be a fact, and let H be a determinate closed linear recursive clause that satis es Dec. Then one of the following conditions must hold: ForceSim(H; f; Dec; DB; h) returns FAILURE, and no recursive subclause H 0 of H satis es both Dec and the constraint H 0 ^DB `h f; or, ForceSim(H; f; Dec; DB; h) returns a clause H 0 , and H 0 is the unique syntactically largest recursive subclause of H that satis es both Dec and the constraint H 0 ^DB `h f.Proof: Again to avoid repetition, we will refer to syntactically maximal recursive (nonrecursive) subclauses H 0 of H that satisfy both Dec and the constraint H 0 ^DB `h f as \admissible recursive (nonrecursive) subclauses" respectively.
The proof largely parallels the proof of Lemma 3|in particular, similar arguments show that the clause returned by ForceSim satis es the conditions of the theorem whenever FAILURE is returned and whenever H is returned.Note that the correctness of ForceSim when H is returned establishes the base case of the theorem for h = 0.
For the case of depth h > 0, let us assume the theorem holds for depth h 1 and proceed using mathematical induction.The arguments of Lemma 3 show that the following condition is true after the for loop terminates.Invariant 1 0 .H 0 is the unique maximal nonrecursive admissible subclause of H, and every (DB; f)-witness for H 0 is a superset of .Now, let us assume that there is some admissible recursive subclause H . Clearly H must contain the recursive literal L r of H, since L r is the only recursive literal of H. Further, the nonrecursive clause Ĥ = H fL r g must certainly satisfy Dec and also Ĥ ^DB `f, so it must (by the maximality of H 0 ) be a subclause of H 0 .Hence H must be a subclause of H 0 fL r g.Finally, if L r is ground (i.e., if L r is closed in the clause H 0 L r ) then by Invariant 1 0 , the clause H must also satisfy H ^DB `Lr by a proof of depth h 1. (This is simply equivalent to saying that the recursive subgoal of L r generated in the proof must succeed.) By the inductive hypothesis, then, the recursive call must return the unique maximal admissible recursive subclause of H 0 L r , which by the argument above must also be the unique maximal admissible recursive subclause of H.
Thus by induction the theorem holds.

A Learning Algorithm for Linear Recursive Clauses
Given this method for generalizing recursive clauses, one can construct a learning algorithm for recursive clauses as follows.First, guess a recursive literal L r , and make H = BOTTOM d L r the initial hypothesis of the learner.Then, ask a series of equivalence queries.After a positive counterexample e + , use forced simulation to minimally generalize H to cover e + .After a negative example, choose another recursive literal L 0 r , and reset the hypothesis to H = BOTTOM d L 0 r .Figure 5 presents an algorithm that operates along these lines.Let d-DepthLinRec denote the language of linear closed recursive clauses of depth d or less.We have the following result: Theorem 5 For any constants a and d, the language family d-DepthLinRec DB = ; a-DetDEC =1 ] is uniformly identi able from equivalence queries.
Proof: We will show that Force1 uniformly identi es this language family with a polynomial number of queries.
Correctness and query e ciency.There are at most aj jDj j + aj jDBj j constants in any set DB D, at most (aj jDj j + aj jDBj j) a 0 a 0 -tuples of such constants, and hence at most (aj jDj j+ aj jDBj j) a 0 distinct recursive subgoals L r that might be produced in proving that a linear recursive clause C covers an extended instance (f; D).Thus every terminating proof of a fact f using a linear recursive clause C must be of depth (aj jDj j+ aj jDBj j) a 0 or less; i.e., for h = (aj jDj j + aj jDBj j) a 0 , C ^DB ^D `h f i C ^DB ^D `f Thus Theorem 4 can be strengthened: for the value of h used in Force1, the subroutine ForceSim returns the syntactically largest subclause of H that covers the example (f; D) whenever any such a subclause exists, and returns FAILURE otherwise.
We now argue the correctness of the algorithm as follows.Assume that the hypothesized recursive literal is \correct"|i.e., that the target clause C T is some subclause of BOTTOM d L r .In this case it is easy to see that Force1 will identify C T , using an argument that parallels the one made for Force1 NR .Again by analogy to Force1 NR , it is easy to see that only a polynomial number of equivalence queries will be made involving the correct recursive literal.
Next assume that L r is not the correct recursive literal.Then C T need not be a subclause of BOTTOM d L r , and the response to an equivalence query may be either a positive or negative counterexample.If a positive counterexample e + is received and ForceSim is called, then the result may be FAILURE, or it may be a proper subclause of H that covers e + .Thus the result of choosing an incorrect L r will be a (possibly empty) sequence of positive counterexamples followed by either a negative counterexample or FAILURE.Since all equivalence queries involving the correct recursive literal will be answered by either a positive counterexample or \yes"4 , then if a negative counterexample or FAILURE is obtained, it must be that L r is incorrect.
The number of variables in BOTTOM d can be bounded by aj jBOTTOM d (Dec)j j, and as each closed recursive literal is completely de ned by an a 0 -tuple of variables, the number of possible closed recursive literals L r can be bounded by p = (aj jBOTTOM d (Dec)j j) a 0 Since j jBOTTOM d (Dec)j j is polynomial in j jDecj j, p is also polynomial in j jDecj j.This means that only a polynomial number of incorrect L r 's need to be discarded.Further since each successive hypothesis using a single incorrect L r is a proper subclause of the previous hypothesis, only a polynomial number of equivalence queries are needed to discard an incorrect L r .Thus only a polynomial number of equivalence queries can be made involving incorrect recursive literals.
Thus Force1 needs only a polynomial number of queries to identify C t .
E ciency.ForceSim runs in time polynomial in its arguments H , f, Dec, DB D and h.When ForceSim is called from Force1, h is always polynomial in n e and j jDBj j, and H is always no larger than j jBOTTOM d (Dec)j j + 1, which in turn is polynomial in the size of Dec. Hence every invocation of ForceSim requires time polynomial in n e , Dec, and DB, and hence Force1 processes each query in polynomial time.
This completes the proof.
This result is somewhat surprising, as it shows that recursive clauses can be learned even given an adversarial choice of training examples.In contrast, most implemented ILP systems require well-choosen examples to learn recursive clauses.
This formal result can also be strengthened in a number of technical ways.One of the more interesting strengthenings is to consider a variant of Force1 that maintains a xed set of positive and negative examples, and constructs the set of all least general clauses that are consistent with these examples: this could be done by taking each of the clauses BOTTOM d L r 1 , : : :, BOTTOM d L rp , forcibly simulating them on each of the positive examples in turn, and then discarding those clauses that cover one of more negative examples.This set of clauses could then be used to tractably encode the version space of all consistent programs, using the S; N] representation for version spaces (Hirsh, 1992).

Extending the Learning Algorithm
We will now consider a number of ways in which the result of Theorem 5 can be extended.

The Equality-Predicate and Unique-Mode Assumptions
Theorem 5 shows that the language family d-DepthLinRec DB = ; a-DetDEC =1 ] is identi able from equivalence queries.It is natural to ask if this result can be extended by dropping the assumptions that an equality predicate is present and that the declaration contains a unique legal mode for each predicate: that is, if the result can be extended to the language family d-DepthLinRec DB; a-DetDEC] This extension is in fact straightforward.Given a database DB and a declaration Dec = (p; a 0 ; R) that do not satisfy the equality-predicate and unique-mode assumptions, one can modify them as follows.
1.For every constant c appearing in DB, add the fact equal(c; c) to DB. 2. For every predicate q that has k valid modes qs 1 , : : :, qs k in R: (a) remove the mode declarations for q, and replace them with k mode strings for the k new predicates q s 1 , : : :, q s k , letting q s i s i be the unique legal mode for the predicate q s i ; (b) remove every fact q(t 1 ; : : :; t a ) of the predicate q from DB, and replace it with the k facts q s 1 (t 1 ; : : :; t a ), : : :, q s k (t 1 ; : : :; t a ).Note that if the arity of predicates is bounded by a constant a, then the number of modes k for any predicate q is bounded by the constant 2 a , and hence these transformations can be performed in polynomial time, and with only a polynomial increase in the size of Dec and DB.
Clearly any target clause C t 2 d-DepthLinRec DB; Dec] is equivalent to some clause C 0 t 2 d-DepthLinRec DB 0 ; Dec 0 ], where DB 0 and Dec 0 are the modi ed versions of DB and Dec constructed above.Using Force1 it is possible to identify C 0 t .(In learning C 0 t , one must also perform steps 1 and 2b above on the description part D of every counterexample (f; D).) Finally, one can convert C 0 t to an equivalent clause in d-DepthLinRec DB; Dec] by repeatedly resolving against the clause equal(X,X) , and also replacing every predicate symbol q s i with q.
This leads to the following strengthening of Theorem 5: Proposition 6 For any constants a and d, the language family d-DepthLinRec DB; a-DetDEC] is uniformly identi able from equivalence queries.

The Datalog Assumption
So far we have assumed that the target program contains no function symbols, and that the background knowledge provided by the user is a database of ground facts.While convenient for formal analysis, these assumptions can be relaxed.
Examination of the learning algorithm shows that the database DB is used in only two ways.
In forcibly simulating a hypothesis on an extended instance (f; D), it is necessary to nd a substitution 0 that makes a literal L true in the database DB D. While this can be done algorithmically if DB and D are sets of ground facts, it is also plausible to assume that the user has provided an oracle that answers in polynomial time any mode-correct query L to the database DB.Speci cally, the answer of the oracle will be either { the (unique) most-general substitution 0 such that DB ^D `L 0 and L 0 is ground; or { \no" if no such 0 exists.Such an oracle would presumably take the form of an e cient theorem-prover for DB.
When calling ForceSim, the top-level learning algorithm uses DB and D to determine a depth bound on the length of a proof made using the hypothesis program.Again, it is reasonable to assume that the user can provide this information directly, in the form of an oracle.Speci cally, this oracle would provide for any fact f a polynomial upper bound on the depth of the proof for f in the target program.
Finally we note that if e cient (but non-ground) background knowledge is allowed, then function symbols always can be removed via attening (Rouveirol, 1994).This transformation also preserves determinacy, although it may increase depth|in general, the depth of a attened clause depends also on term depth in the original clause.Thus, the assumption that the target program is in Datalog can be replaced by assumptions that the term depth is bounded by a constant, and that two oracles are available: an oracle that answers queries to the background knowledge, and a depth-bound oracle.Both types of oracles have been frequently assumed in the literature (Shapiro, 1982;Page & Frisch, 1992;D zeroski et al., 1992).

Learning k-ary Recursive Clauses
It is also natural to ask if Theorem 5 can be extended to clauses that are not linear recursive.
One interesting case is the case of closed k-ary recursive clauses for constant k.It is straightforward to extend Force1 to guess a tuple of k recursive literals L r 1 , : : :, L r k , and then to extend ForceSim to recursively generalize the hypothesis clause on each of the facts L r 1 , : : :, L r k .The arguments of Theorems 4 and 5 can be modi ed to show that this extension will identify the target clause after a polynomial number of equivalence queries.
Unfortunately, however, it is no longer the case that ForceSim runs in polynomial time.This is easily seen if one considers a tree of all the recursive calls made by ForceSim; in general, this tree will have branching factor k and polynomial depth, and hence exponential size.This result is unsurprising, as the implementation of ForceSim described forcibly simulates a depth-bounded top-down interpreter, and a k-ary recursive program can take exponential time to interpret with such an interpreter.
There are at least two possible solutions to this problem.One possible solution is to retain the simple top-down forced simulation procedure, and require the user to provide a depth bound tighter than (aj jDj j + aj jDBj j) a 0 , the maximal possible depth of a tree.For example, in learning a 2-ary recursive sort such as quicksort, the user might specify a logarithmic depth bound, thus guaranteeing that ForceSim is polynomial-time.This requires additional input from the user, but would be easy to implement.It also has the advantage (not shared by the approach described below) that the hypothesized program can be executed using a simple depth-bounded Prolog interpreter, and will always have shallow proof trees.This seems to be a plausible bias to impose when learning k-ary recursive Prolog programs, as many of these tend to have shallow proof trees.
A second solution to the possible high cost of forced simulation for k-ary recursive programs is to forcibly simulate a \smarter" type of interpreter|one which can execute k-ary recursive program in polynomial time. 5One sound and complete theorem-prover for closed k-ary recursive programs can be implemented as follows.
Construct a top-down proof tree in the usual fashion, i.e., using a depth-rst left-to-right strategy, but maintain a list of the ancestors of the current subgoal, and also a list VISITED that records, for each previously visited node in the tree, the subgoal associated with that node.Now, suppose that in the course of constructing the proof tree one generates a subgoal f that is on the VISITED list.Since the traversal of the tree is depth-rst left-to-right, the node associated with f is either an ancestor of the current node, or is a descendant of some left sibling of an ancestor of the current node.In the former case, the proof tree contains a loop, and cannot produce a successful proof; in this case the theorem-prover should exit with failure.In the latter case, a proof must already exist for f 0 , and hence nodes below the current node in the tree need not be visited; instead the theorem prover can simply assume that f is true.
This top-down interpreter can be easily extended into a forced simulation procedure: one simply traverses the tree in the same order, generalizing the current hypothesis H as needed to justify each inference step in the tree.The only additional point to note is that if one is performing forced simulation and revisits a previously proved subgoal f at a node n, the current clause H need not be further generalized in order to prove f, and hence it is again permissible to simply skip the portion of the tree below n.We thus have the following result.
Theorem 7 Let d-Depth-k-Rec be the set of k-ary closed recursive clauses of depth d.
For any constants a, d, and k the language family d-Depth-k-Rec DB; a-DetDEC] is uniformly identi able from equivalence queries.
Proof: Omitted, but following the informal argument made above.
Note that we give this result without the restrictions that the database contains an equality relation and that the declaration is unique-mode, since the tricks used to relax these restrictions in Proposition 6 are still applicable.

Learning Recursive and Base Cases Simultaneously
So far, we have analyzed the problem of learning single clauses: rst a single nonrecursive clause, and then a single recursive clause.However, every useful recursive program contains at least two clauses: a recursive clause, and a nonrecursive base case.It is natural to ask if it is possible to learn a complete recursive program by simultaneously learning both a recursive clause, and its associated nonrecursive base case.
In general, this is not possible, as is demonstrated elsewhere (Cohen, 1995).However, there are several cases in which the positive result can be extended to two-clause programs.In this section, we will rst discuss learning a recursive clause and base clause simultaneously, assuming that any determinate base clause is possible, but also assuming that an additional \hint" is available, in the form of a special \basecase" oracle.We will then discuss various alternative types of \hints".
Let P be a target program with base clause C B and recursive clause C R .A basecase oracle for P takes as input an extended instance (f; D) and returns \yes" if C B ^DB^D `f, and \no" otherwise.In other words, the oracle determines if f is covered by the nonrecursive base clause alone.As an example, for the append program, the basecase oracle should return \yes" for an instance append(Xs,Ys,Zs) when Xs is the empty list, and \no" otherwise.
Given the existence of a basecase oracle, the learning algorithm can be extended as  is uniformly identi able from equivalence and basecase queries.
A companion paper (Cohen, 1995) shows that something like the basecase oracle is necessary: in particular, without any \hints" about the base clause, learning a two-clause linear recursive program is as hard as learning boolean DNF.However, there are several situations in which the basecase oracle can be dispensed with.
Case 1.The basecase oracle can be replaced by a polynomial-sized set of possible base clauses.The learning algorithm in this case is to enumerate pairs of base clauses C B i and \starting clauses" BOTTOM L r j , generalize the starting clause with forced simulation, and mark a pair as incorrect if overgeneralization is detected.
Case 2. The basecase oracle can be replaced by a xed rule that determines when the base clause is applicable.For example, consider the rule that says that the base clause is applicable to any atom p(X 1 ; : : :; X a ) such that no X i is a non-null list.Adopting such a rule leads immediately to a learning procedure that pac-learns exactly those two-clause linear recursive programs for which the rule is correct.
Case 3. The basecase oracle can be also be replaced by a polynomial-sized set of rules for determining when a base clause is applicable.The learning algorithm in this case is pick a unmarked decision rule and run Force2 using that rule as a basecase oracle.If Force2 returns \no consistent hypothesis" then the decision rule is marked incorrect, and a new one is choosen.This algorithm will learn those two-clause linear recursive programs for which any of the given decision rules is correct.Even though the general problem of determining a basecase decision rule for an arbitrary Datalog program may be di cult, it may be that a small number of decision procedures apply to a large number of common Prolog programs.For example, the recursion for most list-manipulation programs halts when some argument is reduced to a null list or to a singleton list.Thus Case 3 above seems likely to cover a large fraction of the automatic logic programming programs of practical interest.
We also note that heuristics have been proposed for nding such basecase decision rules automatically using typing restrictions (Stahl, Tausend, & Wirth, 1993).

Combining the Results
Finally, we note that all of the extensions described above are compatible.This means that if we let kd-MaxRecLang be the language of two-clause programs consisting of one clause C R that is k-ary closed recursive and depth-d determinate, and one clause C B that is nonrecursive and depth-d determinate, then the following holds.
Proposition 9 For any constants a, k and d the language family kd-MaxRecLang DB; a-DetDEC] is uniformly identi able from equivalence and basecase queries.

Further Extensions
The notation kd-MaxRecLang may seem at this point to be unjusti ed; although it is the most expressive language of recursive clauses that we have proven to be learnable, there are numerous extensions that may be e ciently learnable.For example, one might generalize the language to allow an arbitrary number of recursive clauses, or to include clauses that are not determinate.These generalizations might very well be pac-learnable|given the results that we have presented so far.However, a companion paper (Cohen, 1995) presents a series of negative results showing that most natural generalizations of kd-MaxRecLang are not e ciently learnable, and further that kd-MaxRecLang itself is not e ciently learnable without the basecase oracle.Speci cally, the companion paper shows that eliminating the basecase oracle leads to a problem that is as hard as learning boolean DNF, an open problem in computational learning theory.Similarly, learning two linear recursive clauses simultaneously is as hard as learning DNF, even if the base case is known.Finally, the following learning problems are all as hard as breaking certain (presumably) secure cryptographic codes: learning n linear recursive determinate clauses, learning one n-ary recursive determinate clause, or learning one linear recursive \k-local" clause.All of these negative results hold not only for the model of identi cation from equivalence queries, but also for the weaker models of pac-learnability and pac-predictability.

Related Work
In discussing related work we will concentrate on previous formal analyses that employ a learning model similar to that considered here: namely, models that (a) require all computation be polynomial in natural parameters of the problem, and (b) assume either a neutral source or adversarial source of examples, such as equivalence queries or stochastically presented examples.We note, however, that much previous formal work exists that relies on di erent assumptions.For instance, there has been much work in which member or subset queries are allowed (Shapiro, 1982;De Raedt & Bruynooghe, 1992), or where examples are choosen in some non-random manner that is helpful to the learner (Ling, 1992;De Raedt & D zeroski, 1994).There has also been some work in which the e ciency requirements imposed by the pac-learnability model are relaxed (Nienhuys-Cheng & Polman, 1994).If the requirement of e ciency is relaxed far enough, very general positive results can be obtained using very simple learning algorithms.For example, in model of learnability in the limit (Gold, 1967), any language that is both recursively enumerable and decidable (which includes all of Datalog) can be learned by a simple enumeration procedure; in the model of U-learnability (Muggleton & Page, 1994) any language that is polynomially enumerable and polynomially decidable can be learned by enumeration.The most similar previous work is that of Frazier andPage (1993a, 1993b).They analyze the learnability from equivalence queries of recursive programs with function symbols but without background knowledge.The positive results they provide are for program classes that satisfy the following property: given a set of positive examples S + that requires all clauses in the target program to prove the instances in S + , only a polynomial number of recursive clauses are possible; further the base clause must have a certain highly constrained form.Thus the concept class is \almost" bounded in size by a polynomial.The learning algorithm for such a program class is to interleave a series of equivalence queries that test every possible target program.In contrast, our positive results are for exponentially large classes of recursive clauses.Frazier and Page also present a series of negative results suggesting that the learnable languages that they analyzed are di cult to generalize without sacri cing e cient learnability.
Previous results also exist on the pac-learnability of nonrecursive constant-depth determinate programs, and on the pac-learnability of recursive constant-depth determinate programs in a model that also allows membership and subset queries (D zeroski et al., 1992).
The basis for the intelligent search used in our learning algorithms is the technique of forced simulation.This method nds the least implicant of a clause C that covers an extended instance e.Although when we developed this method we believed it to be original, subsequently we discovered that this was not the case|an identical technique had been previously proposed by Ling (1991).Since an extended instance e can be converted (via saturation) to a ground Horn clause, there is also a close connection between forced simulation and recent work on \inverting implication" and \recursive anti-uni cation"; for instance, Muggleton (1994) describes a nondeterministic procedure for nding all clauses that imply a clause C, and Idestam-Almquist (1993) describes a means of constraining such an implicant-generating procedure to produce the least common implicant of two clauses.However, while both of these techniques have obvious applications in learning, both are extremely expensive in the worst case.
The CRUSTACEAN system (Aha et al., 1994) uses inverting implication in constrained settings to learn certain restricted classes of recursive programs.The class of programs e ciently learned by this system is not formally well-understood, but it appears to be similar to the classes analyzed by Frazier and Page.Experimental results show that these systems perform well on inferring recursive programs that use function symbols in certain restricted ways.This system cannot, however, make use of background knowledge.
Finally, we wish to direct the reader to several pieces of our own research that are relevant.As noted above, a companion paper exists which presents negative learnability results for several natural generalizations of the language kd-MaxRecLang (Cohen, 1995).Another related paper investigates the learnability of non-recursive Prolog programs (Cohen, 1993b); this paper also contains a number of negative results which strongly motivate the restriction of constant-depth determinacy.A nal prior paper which may be of interest presents some experimental results with a Prolog implementation of a variant of the Force2 algorithm (Cohen, 1993a).This paper shows that forced simulation can be the basis of a learning program that outperforms state-of-the art heuristic methods such as FOIL (Quinlan, 1990;Quinlan & Cameron-Jones, 1993) in learning from randomly chosen examples.

Conclusions
Just as it is often desirable to have guarantees of correctness for a program, in many plausible contexts it would be highly desirable to have an automatic programming system o er some formal guarantees of correctness.The topic of this paper is the learnability of recursive logic programs using formally well-justi ed algorithms.More speci cally, we have been concerned with the development of algorithms that are provably sound and e cient in learning recursive logic programs from equivalence queries.We showed that one constantdepth determinate closed k-ary recursive clause is identi able from equivalent queries; this implies immediately that this language is also learnable in Valiant's (1984) model of paclearnability.We also showed that a program consisting of one such recursive clause and one constant-depth determinate nonrecursive clause is identi able from equivalence queries given an additional \basecase oracle", which determines if a positive example is covered by the non-recursive base clause of the target program alone.
In obtaining these results, we have introduced several new formal techniques for analyzing the learnability of recursive programs.We have also shown the soundness and e ciency of several instances of generalization by forced simulation.This method may have applications in practical learning systems.The Force2 algorithm compares quite well experimentally with modern ILP systems on learning problems from the restricted class that it can identify (Cohen, 1993a); thus sound learning methods like Force2 might be useful as a lter before a more general ILP system like FOIL (Quinlan, 1990;Quinlan & Cameron-Jones, 1993).Alternatively, forced simulation could be used in heuristic programs.For and without loss of generality let us assume that no pair of literals L i and L j in the body of C have the same mode, predicate symbol, and sequence of input variables. 6 Given C, let us now de ne the substitution C as follows: 1. Initially set C fX 1 = X 1 ; : : :; X a 0 = X a 0 g where X 1 ; : : :; X a 0 are the arguments to the head of BOTTOM d and X 1 ; : : :; X a 0 are the arguments to the head of C.
Notice that because the variables in the head of BOTTOM d are distinct, this mapping is well-de ned.
2. Next, examine each of the literals in the body of C in left-to-right order.For each literal L, let variables T 1 ; : : :T k be its input variables.For each literal L in the body BOTTOM d with the same mode and predicate symbol whose input variables T 1 ; : : :; T k are such that 8i : 1 i r; T j C = T j , modify C as follows: C C fU 1 = U 1 ; : : :; U l = U l g where U 1 ; : : :; U l are the output variables of L and U 1 ; : : :; U l are the output variables of L .Notice that because we assume that C contains only one literal L with a given predicate symbol and sequence of input variables, and because the output variables of literals L in BOTTOM d are distinct, this mapping is again well-de ned.It is also easy to verify (by induction on the length of C) that in executing this procedure some variable in BOTTOM d is always mapped to each input variable T i , and that at least one L meeting the requirements above exists.Thus the mapping C is onto the variables appearing in C. 7  Let A be the head of BOTTOM d , and consider the clause C 0 which is de ned as follows: The head of C 0 is A .The body of C 0 contains all literals L from the body of BOTTOM d such that either { L C is in the body of C { L is the literal equal(X i ; X j ) and X i C = X j C .
We claim that C 0 is a subclause of BOTTOM d that is equivalent to C. Certainly C 0 is a subclause of BOTTOM d .One way to see that it is equivalent to C is to consider the clause Ĉ and the substitution ^ C which are generated as follows.Initially, let Ĉ = C 0 and let ^ C = C .Then, for every literal L = equal(X i ; X j ) in the body of Ĉ, delete L from Ĉ, and nally replace Ĉ with Ĉ ij and replace ^ C with ( ^ C ) ij , where ij is the substitution fX i = X ij ; X j = X ij g and X ij is some new variable not previously appearing 6.This assumption can be made without loss of generality since for a determinate clause C, the output variables of Li and Lj will necessarily be bound to the same values, and hence Li or Lj could be uni ed together and one of them deleted without changing the semantics of C.
7. Recall that a function f : X Y is onto its range Y if 8y 2 Y 9x 2 X : f(x) = y.
in Ĉ. (Note: by ( ^ C ) ij we refer to the substitution formed by replacing every occurrence of X i or X j appearing in ^ C with X ij .)Ĉ is semantically equivalent to C 0 because the operation described above is equivalent to simply resolving each possible L in the body of C 0 against the clause \equal(X,X) ".
The following are now straightforward to verify: ^ C is a one-to-one mapping.
To see that this is true, notice that for every pair of assignments X i = Y and X j = Y in C there must be a literal equal(X i ; X j ) in C 0 .Hence following the process described above the assignments X i = Y and X j = Y in ^ C would eventually be replaced with X ij = Y and X ij = Y .
^ C is onto the variables in C.
Notice that C was onto the variables in C, and for every assignment X i = Y in C there is some assignment in ^ C with a right-hand side of Y (and this assignment is either of the form X i = Y or X ij = Y ).Thus ^ C is also onto the variables in C.
A literal L is in the body of Ĉ i L^ C is in the body of C.This follows from the de nition of C 0 and from the fact that for every literal L from C 0 that is not of the form equal(X i ; X j ) there is a corresponding literal in Ĉ.
Thus Ĉ is an alphabetic variant of C, and hence is equivalent to C. Since Ĉ is also equivalent to C 0 , it must be that C 0 is equivalent to C, which proves our claim.

Figure 4 :
Figure 4: Forced simulation for linear closed recursive clauses

Figure 5 :
Figure 5: A learning algorithm for nonrecursive depth-d determinate clauses

Figure 6 :
Figure 6: A learning algorithm for two-clause recursive programs

Theorem 8
Let d-Depth-2-Clause be the set of 2-clause programs consisting of one clause in d-DepthLinRec and one clause in d-DepthNonRec.For any constants a and d the language family d-Depth-2-Clause DB; a-DetDEC] follows.As before, all possible recursive literals L r i of the clause BOTTOM d are generated; however, in this case, the learner will test two clause hypotheses that are initially of the form (BOTTOM d L r i ; BOTTOM d ).To forcibly simulate such a hypothesis on a fact f, the following procedure is used.After checking the usual termination conditions, the forced simulator checks to see if BASECASE(f) is true.If so, it calls ForceSim NR (with appropriate arguments) to generalize the current hypothesis for the base case.If BASECASE(f) is false, then the recursive clause H r is forcibly simulated on f, a subgoal L r is generated as in before, and the generalized program is recursively forcibly simulated on the subgoal.Figures6 and 7present a learning algorithm Force2 for two clause programs consisting of one linear recursive clause C R and one nonrecursive clause C B , under the assumption that both equivalence and basecase oracles are available.It is straightforward to extend the arguments of Theorem 5 to this case, leading to the following result.