Q-learning with UCB Exploration is Sample Efficient for Infinite-Horizon MDP.

https://openreview.net/forum?id=BkglSTNFDB

@inproceedings{DBLP:conf/iclr/WangDCW20,
  author    = {Yuanhao Wang and
               Kefan Dong and
               Xiaoyu Chen and
               Liwei Wang},
  title     = {Q-learning with {UCB} Exploration is Sample Efficient for Infinite-Horizon
               {MDP}},
  booktitle = {8th International Conference on Learning Representations, {ICLR} 2020,
               Addis Ababa, Ethiopia, April 26-30, 2020},
  publisher = {OpenReview.net},
  year      = {2020},
  url       = {https://openreview.net/forum?id=BkglSTNFDB},
  timestamp = {Thu, 07 May 2020 17:11:47 +0200},
  biburl    = {https://dblp.org/rec/conf/iclr/WangDCW20.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

作者们： Yuanhao Wang Kefan Dong Xiaoyu Chen Liwei Wang

本页面没有标签

本页面最近更新：2020/05/21，更新历史
发现错误？想一起完善？在 GitHub 上编辑此页！
本页面的全部内容在 CC BY-SA 4.0 和 SATA 协议之条款下提供，附加条款亦可能应用