Q-learning with UCB Exploration is Sample Efficient for Infinite-Horizon MDP.

http://arxiv.org/abs/1901.09311

@article{DBLP:journals/corr/abs-1901-09311,
  author    = {Kefan Dong and
               Yuanhao Wang and
               Xiaoyu Chen and
               Liwei Wang},
  title     = {Q-learning with {UCB} Exploration is Sample Efficient for Infinite-Horizon
               {MDP}},
  journal   = {CoRR},
  volume    = {abs/1901.09311},
  year      = {2019},
  url       = {http://arxiv.org/abs/1901.09311},
  archivePrefix = {arXiv},
  eprint    = {1901.09311},
  timestamp = {Sat, 02 Feb 2019 16:56:00 +0100},
  biburl    = {https://dblp.org/rec/journals/corr/abs-1901-09311.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

作者们： Kefan Dong Yuanhao Wang Xiaoyu Chen Liwei Wang

本页面没有标签

本页面最近更新：2020/05/21，更新历史
发现错误？想一起完善？在 GitHub 上编辑此页！
本页面的全部内容在 CC BY-SA 4.0 和 SATA 协议之条款下提供，附加条款亦可能应用