The Asymptotic Equipartition Property in Reinforcement Learning and its Relation to Return Maximization

URI http://harp.lib.hiroshima-u.ac.jp/hiroshima-cu/metadata/7044
ファイル
タイトル
The Asymptotic Equipartition Property in Reinforcement Learning and its Relation to Return Maximization
著者
氏名 IWATA Kazunori
ヨミ イワタ カズノリ
別名 岩田 一貴
氏名 IKEDA Kazushi
ヨミ イケダ カズシ
別名
氏名 SAKAI Hideaki
ヨミ サカイ ヒデアキ
別名
キーワード
Reinforcement learning
Markov decision process
Information theory
Asymptotic equipartition property
Stochastic complexity
Return maximization
抄録

We discuss an important property called the asymptotic equipartition property on empirical sequences in reinforcement learning. This states that the typical set of empirical sequences has probability nearly one, that all elements in the typical set are nearly equi-probable, and that the number of elements in the typical set is an exponential function of the sum of conditional entropies if the number of time steps is sufficiently large. The sum is referred to as stochastic complexity. Using the property we elucidate the fact that the return maximization depends on two factors, the stochastic complexity and a quantity depending on the parameters of environment. Here, the return maximization means that the best sequences in terms of expected return have probability one. We also examine the sensitivity of stochastic complexity, which is a qualitative guide in tuning the parameters of action-selection strategy, and show a sufficient condition for return maximization in probability.

査読の有無
掲載雑誌名
Neural Networks
19
1
開始ページ
62
終了ページ
75
出版年月日
2006-01
出版者
Elsevier
ISSN
08936080
NCID
AA10680676
本文言語
英語
資料タイプ
学術雑誌論文
著者版フラグ
著者版
権利情報
Copyright © 2006 Elsevier Ltd. All rights reserved
関連URL
旧URI
区分
hiroshima-cu