Extending the Peak Bandwidth of Parameters for Softmax Selection in Reinforcement Learning

URI http://harp.lib.hiroshima-u.ac.jp/hiroshima-cu/metadata/12387
ファイル
タイトル
Extending the Peak Bandwidth of Parameters for Softmax Selection in Reinforcement Learning
著者
氏名 IWATA Kazunori
ヨミ イワタ カズノリ
別名 岩田 一貴
キーワード
Asymptotic equipartition property (AEP)
parameter bandwidth
reinforcement learning (RL)
softmax selection
抄録

Softmax selection is one of the most popular
methods for action selection in reinforcement learning. Although
various recently proposed methods may be more effective with
full parameter tuning, implementing a complicated method
that requires the tuning of many parameters can be difficult.
Thus, softmax selection is still worth revisiting, considering the
cost savings of its implementation and tuning. In fact, this
method works adequately in practice with only one parameter
appropriately set for the environment. The aim of this paper
is to improve the variable setting of this method to extend
the bandwidth of good parameters, thereby reducing the cost
of implementation and parameter tuning. To achieve this, we
take advantage of the asymptotic equipartition property in a
Markov decision process to extend the peak bandwidth of softmax
selection. Using a variety of episodic tasks, we show that our
setting is effective in extending the bandwidth and that it yields a
better policy in terms of stability. The bandwidth is quantitatively
assessed in a series of statistical tests.

査読の有無
掲載雑誌名
IEEE Transactions on Neural Networks and Learning Systems
28
8
開始ページ
1865
終了ページ
1877
出版年月日
2016-05-11
出版者
IEEE
ISSN
2162237X
NCID
AA1255553X
DOI
10.1109/TNNLS.2016.2558295
PubMed ID
27187974
本文言語
英語
資料タイプ
学術雑誌論文
著者版フラグ
著者版
権利情報
© 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
関連URL
区分
hiroshima-cu