A statistical property of multiagent learning based on Markov decision process

URI http://harp.lib.hiroshima-u.ac.jp/hiroshima-cu/metadata/6466
File
Title
A statistical property of multiagent learning based on Markov decision process
Author
氏名 IWATA Kazunori
ヨミ イワタ カズノリ
別名 岩田 一貴
氏名 IKEDA Kazushi
ヨミ イケダ カズシ
別名
氏名 SAKAI Hideaki
ヨミ サカイ ヒデアキ
別名
Subject
Asymptotic equipartition property (AEP)
Markov decision process (MDP)
multiagent system
reinforcement learning (RL)
stochastic complexity (SC)
Abstract

We exhibit an important property called the asymptotic equipartition property (AEP) on empirical sequences in an ergodic multiagent Markov decision process (MDP). Using the AEP which facilitates the analysis of multiagent learning, we give a statistical property of multiagent learning, such as reinforcement learning (RL), near the end of the learning process. We examine the effect of the conditions among the agents on the achievement of a cooperative policy in three different cases: blind, visible, and communicable. Also, we derive a bound on the speed with which the empirical sequence converges to the best sequence in probability, so that the multiagent learning yields the best cooperative result.

Description Peer Reviewed
Journal Title
IEEE Transactions on Neural Networks
Volume
17
Issue
4
Spage
829
Epage
842
Published Date
2006-07
Publisher
IEEE
ISSN
1045-9227
Language
eng
NIIType
Journal Article
Text Version
出版社版
Rights
©2006 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
Old URI
Set
hiroshima-cu