Information Gain Agent
- class amlgym.algorithms.InformationGainAgent.InformationGainAgent(input_domain_path, use_object_subset=True, spare_objects_per_type=2, model_mode='safe', learn_negative_preconditions=True, selection_strategy='greedy', epsilon=0.1, temperature=1.0, lookahead_depth=2, lookahead_top_k=5, lookahead_discount=0.9, mcts_iterations=50, mcts_rollout_depth=5)[source]
Bases:
ActiveAlgorithmAdapterOnline action model learning via information gain.
Uses CNF/SAT-based information-theoretic approach to select actions that maximize expected information gain about the action model.
- Parameters:
use_object_subset (bool) – Enable object subset selection for reduced grounding
spare_objects_per_type (int) – Extra objects per type beyond minimum requirement (for subset selection)
model_mode (str) – “safe” (all possible preconditions, confirmed effects only) or “complete” (certain preconditions only, all possible effects)
learn_negative_preconditions (bool) – Whether to learn negative preconditions
selection_strategy (str) – Action selection strategy. One of: - “greedy” — always select highest information gain (default) - “epsilon_greedy” — explore with probability epsilon - “boltzmann” — softmax probabilistic selection - “lookahead” — depth-limited lookahead with discounted future gain - “mcts” — full UCT-based Monte Carlo Tree Search
lookahead_depth (int) – Lookahead depth for ‘lookahead’ strategy (default: 2)
lookahead_top_k (int) – Number of top actions to evaluate in lookahead (default: 5)
lookahead_discount (float) – Discount factor for future gain in lookahead (default: 0.9)
epsilon (float) – Exploration probability for ‘epsilon_greedy’ strategy (default: 0.1)
temperature (float) – Temperature for ‘boltzmann’ softmax selection (default: 1.0)
mcts_iterations (int) – Number of MCTS iterations per action selection (default: 50)
mcts_rollout_depth (int) – Simulation depth during MCTS rollout phase (default: 5)
Example
from unified_planning.io import PDDLReader from unified_planning.shortcuts import SequentialSimulator from amlgym.algorithms import get_algorithm from amlgym.benchmarks import get_domain_path, get_problems_path from amlgym.util.util import empty_domain domain = 'blocksworld' domain_ref_path = get_domain_path(domain) input_domain_path = empty_domain(domain_ref_path) problem_path = get_problems_path(domain, kind='learning')[0] problem = PDDLReader().parse_problem(domain_ref_path, problem_path) env = SequentialSimulator(problem=problem) info_gain = get_algorithm('InformationGainAgent', input_domain_path=input_domain_path) model, trajectory = info_gain.learn(env, max_steps=100) # With lookahead strategy info_gain = get_algorithm( 'InformationGainAgent', input_domain_path=input_domain_path, selection_strategy='lookahead', lookahead_depth=3, ) model, trajectory = info_gain.learn(env, max_steps=100) print("##################### Learned model #####################") print(model) print("################# Generated trajectory ##################") print(trajectory)
- __init__(input_domain_path, use_object_subset=True, spare_objects_per_type=2, model_mode='safe', learn_negative_preconditions=True, selection_strategy='greedy', epsilon=0.1, temperature=1.0, lookahead_depth=2, lookahead_top_k=5, lookahead_discount=0.9, mcts_iterations=50, mcts_rollout_depth=5)
-
epsilon:
float= 0.1
-
input_domain_path:
str
- learn(simulator, max_steps=500, seed=123)[source]
Learn a PDDL action model by interacting with the environment.
- Parameters:
simulator (
SequentialSimulator) – environment simulatormax_steps (
int) – maximum number of interaction steps with the simulatorseed (
int) – random seed for reproducibility
- Return type:
Tuple[str,Trajectory]- Returns:
(learned PDDL model string, trajectory)
-
learn_negative_preconditions:
bool= True
-
lookahead_depth:
int= 2
-
lookahead_discount:
float= 0.9
-
lookahead_top_k:
int= 5
-
mcts_iterations:
int= 50
-
mcts_rollout_depth:
int= 5
-
model_mode:
str= 'safe'
-
selection_strategy:
str= 'greedy'
-
spare_objects_per_type:
int= 2
-
temperature:
float= 1.0
-
use_object_subset:
bool= True