Utile Distinction Hidden Markov Models
Daan Wierstra - Utrecht University
Marco Wiering - Utrecht University
This paper addresses the problem of constructing good action selectionpolicies for agents acting in partially observable environments, a class ofproblems generally known as Partially Observable Markov Decision Processes. Wepresent a novel approach that uses a modification of the well-known Baum-Welchalgorithm for learning a Hidden Markov Model (HMM) to predict both perceptsand utility in a non-deterministic world. This enables an agent to makedecisions based on its previous history of actions, observations, and rewards.Our algorithm, called Utile Distinction Hidden Markov Models (UDHMM), handlesthe creation of memory well in that it tends to create perceptual and utilitydistinctions only when needed, while it can still discriminate states based onhistories of arbitrary length. The experimental results in highly stochasticproblem domains show very good performance.