Random Agent

class amlgym.algorithms.RandomAgent.RandomAgent(input_domain_path)[source]

Bases: ActiveAlgorithmAdapter

A simple baseline for online learning in a fully observable and deterministic environment by randomly executing actions. The baselines firstly generates a trajectory and then applies the SAM algorithm for offline learning a model from the generated trace.

Example

from unified_planning.io import PDDLReader
from unified_planning.shortcuts import SequentialSimulator
from amlgym.algorithms import get_algorithm
from amlgym.benchmarks import get_domain_path, get_problems_path
from amlgym.util.util import empty_domain

domain = 'blocksworld'
domain_ref_path = get_domain_path(domain)
input_domain_path = empty_domain(domain_ref_path)
problem_path = get_problems_path(domain, kind='learning')[0]
problem = PDDLReader().parse_problem(domain_ref_path, problem_path)

env = SequentialSimulator(problem=problem)
baseline = get_algorithm('RandomAgent', input_domain_path=input_domain_path)
model, trajectory = baseline.learn(env, max_steps=100)

print("##################### Learned model #####################")
print(model)

print("################# Generated trajectory ##################")
print(trajectory)
__init__(input_domain_path)
input_domain_path: str
learn(simulator, max_steps=100, seed=123)[source]
Learns a PDDL action model from:
  1. a simulator of the environment to learn from

  2. a (possibly empty) input model which is required to specify the predicates and operators signature (set via the input_domain_path attribute at instantiation time);

Parameters:
  • simulator (SequentialSimulator) – environment simulator

  • max_steps (int) – maximum number of interaction steps with the simulator

  • seed (int) – random seed for reproducibility

Return type:

Tuple[str, Trajectory]

Returns:

a string representing the learned PDDL model, and a JSON specification of the trajectory