当前位置: 首页 > news >正文

强化学习 ——

强化学习 ——




Limited Sensors: For this variation, we restrict the obser-
vations to only provide positional information (including
joint angles), excluding velocities. An agent now has to
learn to infer velocity information in order to recover the
full state. Similar tasks have been explored in Gomez &
Miikkulainen (1998); Sch¨afer & Udluft (2005); Heess et al.
(2015a); Wierstra et al. (2007).
Noisy Observations and Delayed Actions: In this case,
sensor noise is simulated through the addition of Gaussian
noise to the observations. We also introduce a time de-
lay between taking an action and the action being in effect,
accounting for physical latencies (Hester & Stone, 2013).
Agents now need to learn to integrate both past observa-
tions and past actions to infer the current state. Similar
tasks have been proposed in Bakker (2001).
System Identification: For this category, the underly-
ing physical model parameters are varied across different
episodes (Szita et al., 2003). The agents must learn to gen-
eralize across different models, as well as to infer the model
parameters from its observation and action history.