强化学习算法 —— 带自适应步长的策略梯度算法(PG算法、Adaptive step size for Adam optimizer)
Adaptive step size for Adam optimizer
Adam stepsize was adjusted based on the target value of the KL divergence
vanilla policy gradient with adaptive stepsize3, After each batch of data, the Adam stepsize is adjusted based on the KL divergence of the original and updated
policy, using a rule similar to the one shown in Section 4. An implementation is available at https://github.com/berkeleydeeprlcourse/homework/tree/master/hw4.
中文精准翻译
带自适应步长的原始策略梯度算法:每收集完一批数据后,依据旧策略与更新后策略之间的 KL 散度,对 Adam 优化器的步长进行自适应调整,所采用的调整规则与第 4 节给出的规则类似。相关实现代码可在以下 GitHub 仓库获取:
https://github.com/berkeleydeeprlcourse/homework/tree/master/hw4
本博客是博主个人学习时的一些记录,不保证是为原创,个别文章加入了转载的源地址,还有个别文章是汇总网上多份资料所成,在这之中也必有疏漏未加标注处,如有侵权请与博主联系。
如果未特殊标注则为原创,遵循 CC 4.0 BY-SA 版权协议。
