Agents¶
All agents should extend the base Agent
class and implement the act()
method:
from train import Agent
class MyAgent(Agent):
def act(self, state):
...
When train()
or test()
methods are called, an action is selected by calling the act()
method and passed to the environment. Then the environment returns a reward and observation. This entire transition (S, A, R, S’) is saved in a Transitions
object which can be accessed using self.transitions
. When an episode terminates, a new episode is started by resetting the environment and agent.
During training, the following callback methods on agent are called at respective stages:
on_step_begin
on_step_end
on_episode_begin
on_episode_end
These methods combined with the Transitions
object in self.transitions
can be used to implement various algorithms. on_step_end()
can be used to implement online algorithms such as TD(0) and on_episode_end()
can be used to implement algorithms such as Monte Carlo methods:
class MyAgent(Agent):
def on_step_end(self):
# DQN
S, A, R, Snext, dones = self.transitions.sample(32) # randomly sample transitions
...
def on_episode_end(self):
# REINFORCE
S, A, R, Snext, dones = self.transitions.get() # get all recent transitions
self.transitions.reset() # reset transitions for next episode
...
Note
Transitions are not recorded when running test()
.
Agent¶
-
class
train.
Agent
(state=0, transitions=1, **kwargs)[source]¶ Base class for all agents.
Parameters: - state (int, State) – A number representing the number of recent observations to save in state or a custom
State
object. - transitions (int, Transitions) – A number representing the number of recent transitions to save in history or a custom
Transitions
object. - env – OpenAI Gym like environment object.
- gamma (float) – A custom parameter that can be used as discount factor,
- alpha (float) – A custom parameter that can be used as learning rate ,
- lambd (float) – A custom parameter that can be used by various algorithms such as TD(lambda),
- parameters – List of trainable variables used by agent.
-
act
(state)¶ Select an action by reading the current state.
Parameters: state (array_like) – Current state of agent based on past observations. Returns: An action to take in the environment.
-
run
(episodes, env=None, max_steps=-1, max_episode_steps=-1, render=False)¶ Run the agent in environment.
Parameters: Returns: List of cumulative rewards in each episode.
Return type:
- state (int, State) – A number representing the number of recent observations to save in state or a custom