University of Pennsylvania

4/26/2001

3:30pm-4:30pm

Reinforcement Learning (RL) is a framework by which an agent autonomously learns to improve the rewards it receives from its environment, thus conceptually embodying one of the fundamental aims of AI. RL algorithms have been shown to be effective on a variety of discrete problem domains, where simulation allows millions of learning episodes to be executed. However, the application of RL to large continuous problem domains where only limited numbers of learning episodes can be afforded, has not met with as much success. I propose a set of methods in which the policy definitions used to encode how an agent interacts with its environment are represented by a finite set of parameters. Policy Gradient (PG) RL is used to incrementally modify this parameterized policy along a gradient of improved reward. I will describe three new PG algorithms we have developed to make RL feasible in high dimensional continuous state spaces: Boundary Localized Reinforcement Learning (BLRL), Action Transition Policy Gradient (ATPG), and Deterministic Policy Gradient (DPG). These algorithms directly encode prior domain knowledge, vastly improving convergence speed, and are all theoretically guaranteed to converge to locally optimal control policies. The computational feasibility of these algorithms is demonstrated experimentally on simulated and real problems taken from robotics. We show convergence to locally optimal policies in less than a few hundred episodes, unlike other PG algorithms that require orders of magnitude more episodes to converge.

*Hosted by Michael Mozer.Refreshments will be served immediately following the talk in ECOT 831.*

Department of Computer Science

University of Colorado Boulder

Boulder, CO 80309-0430 USA

webmaster@cs.colorado.edu

University of Colorado Boulder

Boulder, CO 80309-0430 USA

webmaster@cs.colorado.edu