当前位置: 首页 > news >正文

强化学习算法 —— 带自适应步长的策略梯度算法(PG算法、Adaptive step size for Adam optimizer)

强化学习算法 —— 带自适应步长的策略梯度算法(PG算法、Adaptive step size for Adam optimizer)

Adaptive step size for Adam optimizer




Adam stepsize was adjusted based on the target value of the KL divergence




vanilla policy gradient with adaptive stepsize3, After each batch of data, the Adam stepsize is adjusted based on the KL divergence of the original and updated
policy, using a rule similar to the one shown in Section 4. An implementation is available at https://github.com/berkeleydeeprlcourse/homework/tree/master/hw4.




中文精准翻译

带自适应步长的原始策略梯度算法:每收集完一批数据后,依据旧策略与更新后策略之间的 KL 散度,对 Adam 优化器的步长进行自适应调整,所采用的调整规则与第 4 节给出的规则类似。相关实现代码可在以下 GitHub 仓库获取:
https://github.com/berkeleydeeprlcourse/homework/tree/master/hw4