JAIR Volume 15 Articles

Volume 15 MastHead

Each entry in this table of contents provides links to an individual article and its appendicies(if any)

J. Baxter, P. L. Bartlett "Infinite-Horizon Policy-Gradient Estimation", Volume 15 pages 319-350

Gradient-based approaches to direct policy search in reinforcement learning have received much recent attention as a means to solve problems of partial observability and to avoid some of the problems associated with policy degradation in value-function methods. In this paper we introduce GPOMDP, a simulation-based algorithm for generating a biased estimate of the gradient of the average reward in Partially Observable Markov Decision Processes POMDPs controlled by parameterized stochastic policies. A similar algorithm was proposed by (Kimura et al. 1995). The algorithm's chief advantages are that it requires storage of only twice the number of policy parameters, uses one free beta (which has a natural interpretation in terms of bias-variance trade-off), and requires no knowledge of the underlying state. We prove convergence of GPOMDP, and show how the correct choice of the parameter beta is related to the mixing time of the controlled POMDP. We briefly describe extensions of GPOMDP to controlled Markov chains, continuous state, observation and control spaces, multiple-agents, higher-order derivatives, and a version for training stochastic policies with internal states. In a companion paper (Baxter et al., this volume) we show how the gradient estimates generated by GPOMDP can be used in both a traditional stochastic gradient algorithm and a conjugate-gradient procedure to find local optima of the average reward.

Click here to return to Volume 15 contents list
J. Baxter, P. L. Bartlett, L. Weaver "Experiments with Infinite-Horizon, Policy-Gradient Estimation", Volume 15 pages 351-381

In this paper, we present algorithms that perform gradient ascent of the average reward in a partially observable Markov decision process (POMDP). These algorithms are based on GPOMDP, an algorithm introduced in a companion paper (Baxter & Bartlett, this volume), which computes biased estimates of the performance gradient in POMDPs. The algorithm's chief advantages are that it uses only one free parameter beta, which has a natural interpretation in terms of bias-variance trade-off, it requires no knowledge of the underlying state, and it can be applied to infinite state, control and observation spaces. We show how the gradient estimates produced by GPOMDP can be used to perform gradient ascent, both with a traditional stochastic-gradient algorithm, and with an algorithm based on conjugate-gradients that utilizes gradient information to bracket maxima in line searches. Experimental results are presented illustrating both the theoretical results of (Baxter & Bartlett, this volume) on a toy problem, and practical aspects of the algorithms on a number of more realistic problems.

Click here to return to Volume 15 contents list
T. Elomaa, M. Kaariainen "An Analysis of Reduced Error Pruning", Volume 15 pages 163-187

Top-down induction of decision trees has been observed to suffer from the inadequate functioning of the pruning phase. In particular, it is known that the size of the resulting tree grows linearly with the sample size, even though the accuracy of the tree does not improve. Reduced Error Pruning is an algorithm that has been used as a representative technique in attempts to explain the problems of decision tree learning.

In this paper we present analyses of Reduced Error Pruning in three different settings. First we study the basic algorithmic properties of the method, properties that hold independent of the input decision tree and pruning examples. Then we examine a situation that intuitively should lead to the subtree under consideration to be replaced by a leaf node, one in which the class label and attribute values of the pruning examples are independent of each other. This analysis is conducted under two different assumptions. The general analysis shows that the pruning probability of a node fitting pure noise is bounded by a function that decreases exponentially as the size of the tree grows. In a specific analysis we assume that the examples are distributed uniformly to the tree. This assumption lets us approximate the number of subtrees that are pruned because they do not receive any pruning examples.

This paper clarifies the different variants of the Reduced Error Pruning algorithm, brings new insight to its algorithmic properties, analyses the algorithm with less imposed assumptions than before, and includes the previously overlooked empty subtrees to the analysis.

Click here to return to Volume 15 contents list
J. Hong "Goal Recognition through Goal Graph Analysis", Volume 15 pages 1-30

We present a novel approach to goal recognition based on a two-stage paradigm of graph construction and analysis. First, a graph structure called a Goal Graph is constructed to represent the observed actions, the state of the world, and the achieved goals as well as various connections between these nodes at consecutive time steps. Then, the Goal Graph is analysed at each time step to recognise those partially or fully achieved goals that are consistent with the actions observed so far. The Goal Graph analysis also reveals valid plans for the recognised goals or part of these goals.

Our approach to goal recognition does not need a plan library. It does not suffer from the problems in the acquisition and hand-coding of large plan libraries, neither does it have the problems in searching the plan space of exponential size. We describe two algorithms for Goal Graph construction and analysis in this paradigm. These algorithms are both provably sound, polynomial-time, and polynomial-space. The number of goals recognised by our algorithms is usually very small after a sequence of observed actions has been processed. Thus the sequence of observed actions is well explained by the recognised goals with little ambiguity. We have evaluated these algorithms in the UNIX domain, in which excellent performance has been achieved in terms of accuracy, efficiency, and scalability.

Click here to return to Volume 15 contents list
M. Palomar, P. Martinez-Barco "Computational Approach to Anaphora Resolution in Spanish Dialogues", Volume 15 pages 263-287

This paper presents an algorithm for identifying noun-phrase antecedents of pronouns and adjectival anaphors in Spanish dialogues. We believe that anaphora resolution requires numerous sources of information in order to find the correct antecedent of the anaphor. These sources can be of different kinds, e.g., linguistic information, discourse/dialogue structure information, or topic information. For this reason, our algorithm uses various different kinds of information (hybrid information). The algorithm is based on linguistic constraints and preferences and uses an anaphoric accessibility space within which the algorithm finds the noun phrase. We present some experiments related to this algorithm and this space using a corpus of 204 dialogues. The algorithm is implemented in Prolog. According to this study, 95.9% of antecedents were located in the proposed space, a precision of 81.3% was obtained for pronominal anaphora resolution, and 81.5% for adjectival anaphora.

Click here to return to Volume 15 contents list
P. Stone, M. L. Littman, S. Singh, M. Kearns "ATTac-2000: An Adaptive Autonomous Bidding Agent", Volume 15 pages 189-206

The First Trading Agent Competition (TAC) was held from June 22nd to July 8th, 2000. TAC was designed to create a benchmark problem in the complex domain of e-marketplaces and to motivate researchers to apply unique approaches to a common task. This article describes ATTac-2000, the first-place finisher in TAC. ATTac-2000 uses a principled bidding strategy that includes several elements of adaptivity. In addition to the success at the competition, isolated empirical results are presented indicating the robustness and effectiveness of ATTac-2000's adaptive strategy.

Click here to return to Volume 15 contents list
J. Renz, B. Nebel "Efficient Methods for Qualitative Spatial Reasoning", Volume 15 pages 289-318

The theoretical properties of qualitative spatial reasoning in the RCC8 framework have been analyzed extensively. However, no empirical investigation has been made yet. Our experiments show that the adaption of the algorithms used for qualitative temporal reasoning can solve large RCC8 instances, even if they are in the phase transition region -- provided that one uses the maximal tractable subsets of RCC8 that have been identified by us. In particular, we demonstrate that the orthogonal combination of heuristic methods is successful in solving almost all apparently hard instances in the phase transition region up to a certain size in reasonable time.

Click here to return to Volume 15 contents list
I. Refanidis, I. Vlahavas "The GRT Planning System: Backward Heuristic Construction in Forward State-Space Planning", Volume 15 pages 115-161

This paper presents GRT, a domain-independent heuristic planning system for STRIPS worlds. GRT solves problems in two phases. In the pre-processing phase, it estimates the distance between each fact and the goals of the problem, in a backward direction. Then, in the search phase, these estimates are used in order to further estimate the distance between each intermediate state and the goals, guiding so the search process in a forward direction and on a best-first basis. The paper presents the benefits from the adoption of opposite directions between the preprocessing and the search phases, discusses some difficulties that arise in the pre-processing phase and introduces techniques to cope with them. Moreover, it presents several methods of improving the efficiency of the heuristic, by enriching the representation and by reducing the size of the problem. Finally, a method of overcoming local optimal states, based on domain axioms, is proposed. According to it, difficult problems are decomposed into easier sub-problems that have to be solved sequentially. The performance results from various domains, including those of the recent planning competitions, show that GRT is among the fastest planners.

Click here to return to Volume 15 contents list
T. Sato, Y. Kameya "Parameter Learning of Logic Programs for Symbolic-Statistical Modeling", Volume 15 pages 391-454

We propose a logical/mathematical framework for statistical parameter learning of parameterized logic programs, i.e. definite clause programs containing probabilistic facts with a parameterized distribution. It extends the traditional least Herbrand model semantics in logic programming to distribution semantics, possible world semantics with a probability distribution which is unconditionally applicable to arbitrary logic programs including ones for HMMs, PCFGs and Bayesian networks.

We also propose a new EM algorithm, the graphical EM algorithm, that runs for a class of parameterized logic programs representing sequential decision processes where each decision is exclusive and independent. It runs on a new data structure called support graphs describing the logical relationship between observations and their explanations, and learns parameters by computing inside and outside probability generalized for logic programs.

The complexity analysis shows that when combined with OLDT search for all explanations for observations, the graphical EM algorithm, despite its generality, has the same time complexity as existing EM algorithms, i.e. the Baum-Welch algorithm for HMMs, the Inside-Outside algorithm for PCFGs, and the one for singly connected Bayesian networks that have been developed independently in each research field. Learning experiments with PCFGs using two corpora of moderate size indicate that the graphical EM algorithm can significantly outperform the Inside-Outside algorithm.

Click here to return to Volume 15 contents list
C. Meek "Finding a Path is Harder than Finding a Tree", Volume 15 pages 383-389

I consider the problem of learning an optimal path graphical model from data and show the problem to be NP-hard for the maximum likelihood and minimum description length approaches and a Bayesian approach. This hardness result holds despite the fact that the problem is a restriction of the polynomially solvable problem of finding the optimal tree graphical model.

Click here to return to Volume 15 contents list
J. M. Siskind "Grounding the Lexical Semantics of Verbs in Visual Perception using Force Dynamics and Event Logic", Volume 15 pages 31-90

This paper presents an implemented system for recognizing the occurrence of events described by simple spatial-motion verbs in short image sequences. The semantics of these verbs is specified with event-logic expressions that describe changes in the state of force-dynamic relations between the participants of the event. An efficient finite representation is introduced for the infinite sets of intervals that occur when describing liquid and semi-liquid events. Additionally, an efficient procedure using this representation is presented for inferring occurrences of compound events, described with event-logic expressions, from occurrences of primitive events. Using force dynamics and event logic to specify the lexical semantics of events allows the system to be more robust than prior systems based on motion profile.

Click here to return to Volume 15 contents list
C. Bhattacharyya, S. S. Keerthi "Mean Field Methods for a Special Class of Belief Networks", Volume 15 pages 91-114

The chief aim of this paper is to propose mean-field approximations for a broad class of Belief networks, of which sigmoid and noisy-or networks can be seen as special cases. The approximations are based on a powerful mean-field theory suggested by Plefka. We show that Saul, Jaakkola and Jordan' s approach is the first order approximation in Plefka's approach, via a variational derivation. The application of Plefka's theory to belief networks is not computationally tractable. To tackle this problem we propose new approximations based on Taylor series. Small scale experiments show that the proposed schemes are attractive.

Click here to return to Volume 15 contents list
J. L. Ambite, C. A. Knoblock "Planning by Rewriting", Volume 15 pages 207-261

Domain-independent planning is a hard combinatorial problem. Taking into account plan quality makes the task even more difficult. This article introduces Planning by Rewriting (PbR), a new paradigm for efficient high-quality domain-independent planning. PbR exploits declarative plan-rewriting rules and efficient local search techniques to transform an easy-to-generate, but possibly suboptimal, initial plan into a high-quality plan. In addition to addressing the issues of planning efficiency and plan quality, this framework offers a new anytime planning algorithm. We have implemented this planner and applied it to several existing domains. The experimental results show that the PbR approach provides significant savings in planning effort while generating high-quality plans.

Click here to return to Volume 15 contents list