Agents¶

All agents should extend the base Agent class and implement the act() method:

from train import Agent

class MyAgent(Agent):

    def act(self, state):
        ...

When train() or test() methods are called, an action is selected by calling the act() method and passed to the environment. Then the environment returns a reward and observation. This entire transition (S, A, R, S’) is saved in a Transitions object which can be accessed using self.transitions. When an episode terminates, a new episode is started by resetting the environment and agent.

During training, the following callback methods on agent are called at respective stages:

on_step_begin
on_step_end
on_episode_begin
on_episode_end

These methods combined with the Transitions object in self.transitions can be used to implement various algorithms. on_step_end() can be used to implement online algorithms such as TD(0) and on_episode_end() can be used to implement algorithms such as Monte Carlo methods:

class MyAgent(Agent):

    def on_step_end(self):
        # DQN
        S, A, R, Snext, dones = self.transitions.sample(32) # randomly sample transitions
        ...

    def on_episode_end(self):
        # REINFORCE
        S, A, R, Snext, dones = self.transitions.get() # get all recent transitions
        self.transitions.reset() # reset transitions for next episode
        ...

Note

Transitions are not recorded when running test().

Agent¶

class train.Agent(state=0, transitions=1, **kwargs)[source]¶

Base class for all agents.

Parameters:

state (int, State) – A number representing the number of recent observations to save in state or a custom State object.
transitions (int, Transitions) – A number representing the number of recent transitions to save in history or a custom Transitions object.
env – OpenAI Gym like environment object.
gamma (float) – A custom parameter that can be used as discount factor,
alpha (float) – A custom parameter that can be used as learning rate ,
lambd (float) – A custom parameter that can be used by various algorithms such as TD(lambda),
parameters – List of trainable variables used by agent.

act(state)¶

Select an action by reading the current state.

Parameters:	state (array_like) – Current state of agent based on past observations.
Returns:	An action to take in the environment.

run(episodes, env=None, max_steps=-1, max_episode_steps=-1, render=False)¶

Run the agent in environment.

Parameters:	episodes (int) – Maximum number of episodes to run. env – OpenAI Gym like environment object. max_steps (int) – Maximum number of total steps to run. max_episode_steps (int) – Maximum number steps to run in each episode. render (bool) – Visualize interaction of agent in environment.
Returns:	List of cumulative rewards in each episode.
Return type:	list

test(*args, **kwargs)¶

Run the agent in test mode by setting self.training = False.

See: run()

train(*args, **kwargs)¶

Run the agent in training mode by setting self.training = True.

See: run()