All agents should extend the base Agent class and implement the act() method:

from train import Agent

class MyAgent(Agent):

    def act(self, state):

When train() or test() methods are called, an action is selected by calling the act() method and passed to the environment. Then the environment returns a reward and observation. This entire transition (S, A, R, S’) is saved in a Transitions object which can be accessed using self.transitions. When an episode terminates, a new episode is started by resetting the environment and agent.

During training, the following callback methods on agent are called at respective stages:


These methods combined with the Transitions object in self.transitions can be used to implement various algorithms. on_step_end() can be used to implement online algorithms such as TD(0) and on_episode_end() can be used to implement algorithms such as Monte Carlo methods:

class MyAgent(Agent):

    def on_step_end(self):
        # DQN
        S, A, R, Snext, dones = self.transitions.sample(32) # randomly sample transitions

    def on_episode_end(self):
        # REINFORCE
        S, A, R, Snext, dones = self.transitions.get() # get all recent transitions
        self.transitions.reset() # reset transitions for next episode


Transitions are not recorded when running test().


class train.Agent(state=0, transitions=1, **kwargs)[source]

Base class for all agents.

  • state (int, State) – A number representing the number of recent observations to save in state or a custom State object.
  • transitions (int, Transitions) – A number representing the number of recent transitions to save in history or a custom Transitions object.
  • env – OpenAI Gym like environment object.
  • gamma (float) – A custom parameter that can be used as discount factor,
  • alpha (float) – A custom parameter that can be used as learning rate ,
  • lambd (float) – A custom parameter that can be used by various algorithms such as TD(lambda),
  • parameters – List of trainable variables used by agent.

Select an action by reading the current state.

Parameters:state (array_like) – Current state of agent based on past observations.
Returns:An action to take in the environment.
run(episodes, env=None, max_steps=-1, max_episode_steps=-1, render=False)

Run the agent in environment.

  • episodes (int) – Maximum number of episodes to run.
  • env – OpenAI Gym like environment object.
  • max_steps (int) – Maximum number of total steps to run.
  • max_episode_steps (int) – Maximum number steps to run in each episode.
  • render (bool) – Visualize interaction of agent in environment.

List of cumulative rewards in each episode.

Return type:


test(*args, **kwargs)

Run the agent in test mode by setting = False.

See: run()

train(*args, **kwargs)

Run the agent in training mode by setting = True.

See: run()