目录
当然,三个缩写的全称如下:
-
ORM = Outcome Reward Model
(结果奖励模型) -
PRM = Process Reward Model
(过程奖励模型) -
PAPO = Process-Aware Policy Optimization
(过程感知策略优化)
ORM = Outcome Reward Model
(结果奖励模型)
PRM = Process Reward Model
(过程奖励模型)
PAPO = Process-Aware Policy Optimization
(过程感知策略优化)