Information Gain Agent

class amlgym.algorithms.InformationGainAgent.InformationGainAgent(input_domain_path, use_object_subset=True, spare_objects_per_type=2, model_mode='safe', learn_negative_preconditions=True, selection_strategy='greedy', epsilon=0.1, temperature=1.0, lookahead_depth=2, lookahead_top_k=5, lookahead_discount=0.9, mcts_iterations=50, mcts_rollout_depth=5)[source]

Bases: ActiveAlgorithmAdapter

Online action model learning via information gain.

Uses CNF/SAT-based information-theoretic approach to select actions that maximize expected information gain about the action model.

Parameters:
  • use_object_subset (bool) – Enable object subset selection for reduced grounding

  • spare_objects_per_type (int) – Extra objects per type beyond minimum requirement (for subset selection)

  • model_mode (str) – “safe” (all possible preconditions, confirmed effects only) or “complete” (certain preconditions only, all possible effects)

  • learn_negative_preconditions (bool) – Whether to learn negative preconditions

  • selection_strategy (str) – Action selection strategy. One of: - “greedy” — always select highest information gain (default) - “epsilon_greedy” — explore with probability epsilon - “boltzmann” — softmax probabilistic selection - “lookahead” — depth-limited lookahead with discounted future gain - “mcts” — full UCT-based Monte Carlo Tree Search

  • lookahead_depth (int) – Lookahead depth for ‘lookahead’ strategy (default: 2)

  • lookahead_top_k (int) – Number of top actions to evaluate in lookahead (default: 5)

  • lookahead_discount (float) – Discount factor for future gain in lookahead (default: 0.9)

  • epsilon (float) – Exploration probability for ‘epsilon_greedy’ strategy (default: 0.1)

  • temperature (float) – Temperature for ‘boltzmann’ softmax selection (default: 1.0)

  • mcts_iterations (int) – Number of MCTS iterations per action selection (default: 50)

  • mcts_rollout_depth (int) – Simulation depth during MCTS rollout phase (default: 5)

Example

from unified_planning.io import PDDLReader
from unified_planning.shortcuts import SequentialSimulator
from amlgym.algorithms import get_algorithm
from amlgym.benchmarks import get_domain_path, get_problems_path
from amlgym.util.util import empty_domain

domain = 'blocksworld'
domain_ref_path = get_domain_path(domain)
input_domain_path = empty_domain(domain_ref_path)
problem_path = get_problems_path(domain, kind='learning')[0]
problem = PDDLReader().parse_problem(domain_ref_path, problem_path)

env = SequentialSimulator(problem=problem)
info_gain = get_algorithm('InformationGainAgent', input_domain_path=input_domain_path)
model, trajectory = info_gain.learn(env, max_steps=100)

# With lookahead strategy
info_gain = get_algorithm(
    'InformationGainAgent',
    input_domain_path=input_domain_path,
    selection_strategy='lookahead',
    lookahead_depth=3,
)
model, trajectory = info_gain.learn(env, max_steps=100)

print("##################### Learned model #####################")
print(model)

print("################# Generated trajectory ##################")
print(trajectory)
__init__(input_domain_path, use_object_subset=True, spare_objects_per_type=2, model_mode='safe', learn_negative_preconditions=True, selection_strategy='greedy', epsilon=0.1, temperature=1.0, lookahead_depth=2, lookahead_top_k=5, lookahead_discount=0.9, mcts_iterations=50, mcts_rollout_depth=5)
epsilon: float = 0.1
input_domain_path: str
learn(simulator, max_steps=500, seed=123)[source]

Learn a PDDL action model by interacting with the environment.

Parameters:
  • simulator (SequentialSimulator) – environment simulator

  • max_steps (int) – maximum number of interaction steps with the simulator

  • seed (int) – random seed for reproducibility

Return type:

Tuple[str, Trajectory]

Returns:

(learned PDDL model string, trajectory)

learn_negative_preconditions: bool = True
lookahead_depth: int = 2
lookahead_discount: float = 0.9
lookahead_top_k: int = 5
mcts_iterations: int = 50
mcts_rollout_depth: int = 5
model_mode: str = 'safe'
selection_strategy: str = 'greedy'
spare_objects_per_type: int = 2
temperature: float = 1.0
use_object_subset: bool = True