home · mobile · calendar · colloquia · 1995-1996 · 

Colloquium - Singh

Learning to Act in Dynamic Environments Under Uncertainty and Incomplete Knowledge
Harlequin, Inc.

Making intelligent decisions/plans in the face of uncertainty and incomplete knowledge about their consequences in dynamic environments is a central problem in artificial intelligence. It is also a central problem in control theory, operations research, and statistics. An exciting confluence of ideas from these diverse disciplines with their complementary strengths is taking place in the field of reinforcement learning. I will motivate this confluence and present some of my contributions to it.

Reinforcement learning algorithms require little a priori knowledge about the environment, continually learn about the consequences of their decisions, and seek policies that optimize expected cumulative reward. The theoretical framework inherited from optimal control allows the formulation of precise questions: I will show that reinforcement learning algorithms are Dvoretzky's form of stochastic approximation methods to performing dynamic programming (from operations research), and also how they are related to Monte-Carlo methods from statistics. These relationships lead to a general technique for proving convergence to optimal policies of reinforcement learning algorithms as well as for deriving new reinforcement learning algorithms.

These results assume completely observable states and lookup-table representations, both of which are not feasible in many AI applications. I will derive bounds on the effect of hidden state on reinforcement learning and present a neural-network architecture based on soft-clustering that uses compact representations to accelerate learning. I will also demonstrate the utility of reinforcement learning algorithms with results from an application of reinforcement learning to the difficult problem of channel assignment in cellular telephone systems.

Refreshments will be served immediately before the talk at 3:30pm.

Department of Computer Science
University of Colorado Boulder
Boulder, CO 80309-0430 USA
May 5, 2012 (14:13)