Hello,
I have a functional v1 of the cartpole environment and I am now looking to change the reward system.
Guessing that I need to implement the base velocity (positive or negative) as a condition for better reward (lets say 1.5), and guessing that the velocity will be contained in the observation variable, I am not succeeding to implement it in the code below …
I tried to use condition[1] as the velocity of the base without success …
Could you please support me on this one ?
def _compute_reward(self, observations, done):
"""
Gives more points for staying upright, gets data from given observations to avoid
having different data than other previous functions
:return:reward
"""
if not done:
reward = 1.0
elif self.steps_beyond_done is None:
# Pole just fell!
self.steps_beyond_done = 0
reward = 1.0
else:
if self.steps_beyond_done == 0:
logger.warning("You are calling 'step()' even though this environment has already returned done = True. You should always call 'reset()' once you receive 'done = True' -- any further steps are undefined behavior.")
self.steps_beyond_done += 1
reward = 0.0
return reward