amlgym.metrics package
Module contents
- amlgym.metrics.applicability(simulator, simulator_ref, test_states, applicable_actions=None, show_progress=True)[source]
Evaluate the predicted applicability metric given the simulator of a domain model \(M\) and an environment simulator \(E\). The model \(M\) and environment simulators share the set \(S\) of states and the set \(A\) of actions. For an action \(a\in A\), and state \(s\in S\), we denote by \(app_M(a,S_{test})\) and \(app(a,S_{test})\) the set of states in \(S_{test}\) in which \(a\) is applicable according to \(M\) and \(E\), respectively. We define the predicted applicability metric for every action \(a\in A\) as:
True Positives: \(TP_{app}(a)=|app_M(a,S_{test})\cap app(a,S_{test})|\)
False Positives: \(FP_{app}(a)=|app_M(a,S_{test})\setminus app(a,S_{test})|\)
False Negatives: \(FN_{app}(a)=|app(a,S_{test}) \setminus app_M(a,S_{test})|\)
The predicted applicability precision and recall per action are obtained as:
Predicted applicability precision of \(a\): \(P_{app}(a) = \frac{TP_{app}(a)}{TP_{app}(a)+FP_{app}(a)}\)
Predicted applicability recall of \(a\): \(R_{app}(a) = \frac{TP_{app}(a)}{TP_{app}(a)+FN_{app}(a)}\)
When \(TP_{app}(a) = FP_{app}(a) = 0\), we define \(P_{app}(a)=1\) and \(R_{app}(a)=0\), as the domain model \(M\) never allows \(a\) to be applied in \(S_{test}\). Finally, the mean precision and recall of a domain model averages over the actions precision and recall:
Predicted applicability precision of \(M\): \(P = \frac{1}{|A|}\sum_{a\in A} P(a)\)
Predicted applicability recall of \(M\): \(R = \frac{1}{|A|}\sum_{a\in A} R(a)\)
- Parameters:
simulator (
Union[Env,Sequence[Env]]) – simulator of a domain model \(M\) to be evaluated.simulator_env – environment simulator \(E\) to compare with.
test_states (
Union[Sequence[TypeVar(StateType)],Sequence[Sequence[TypeVar(StateType)]]]) – set \(S_{test}\) of test states.applicable_actions (
Sequence[TypeVar(ActionType)]) – optionally precomputed set of applicable actions for every test state.show_progress (
bool) – show a progress bar.
- Return type:
Dict[str,Dict[str,float]]- Returns:
the predicted applicability precision and recall averaged over all test states and actions.
- amlgym.metrics.predicted_effects(simulator, simulator_env, test_states, applicable_actions=None, show_progress=True)[source]
Evaluate the predicted effects metric given the simulator of a domain model \(M\) and an environment simulator \(E\). The model \(M\) and environment simulators share the set \(S\) of states and the set \(A\) of actions; the evaluation considers actions applicable in a state for both \(M\) and \(E\). For an action \(a\in A\), and state \(s\in S\), we denote by \(a_{M}(s)\) and \(a(s)\) the state resulting from applying \(a\) in \(s\) according to \(M\) and \(E\), respectively. We define the predicted effect metrics for every state \(s \in S_{test}\) and action \(a\in A\) as:
True Positives: \(TP_{eff}(s,a)=|(a_M(s)\setminus s)\cap (a(s)\setminus s)|\)
False Positives: \(FP_{eff}(s,a)=|(a_M(s)\setminus s)\setminus a(s)|\)
False Negatives: \(FN_{eff}(s,a)=|(a_M(s)\cap s)\setminus a(s)|\)
The predicted effects mean precision and recall per action are obtained by averaging over all states in \(S_{test}\), i.e.:
True Positives: \(TP_{eff}(a)=\sum\limits_{s\in S_{test}}TP_{eff}(s,a)\)
False Positives: \(FP_{eff}(a)=\sum\limits_{s\in S_{test}}TP_{eff}(s,a)\)
False Negatives: \(FN_{eff}(a)=\sum\limits_{s\in S_{test}}FN_{eff}(s,a)\)
Predicted effects precision of \(a\) : \(P_{eff}(a) = \frac{TP_{eff}(a)}{TP_{eff}(a)+FP_{eff}(a)}\)
Predicted effects recall of \(a\) : \(R_{eff}(a) = \frac{TP_{eff}(a)}{TP_{eff}(a)+FN_{eff}(a)}\)
When \(TP_{eff}(a) = FP_{eff}(a) = 0\), we define \(P_{eff}(a)=1\) and \(R_{eff}(a)=0\). Finally, the mean precision and recall of a domain model averages over the actions precision and recall:
Predicted effects precision of \(M\): \(P = \frac{1}{|A|}\sum_{a\in A} P(a)\)
Predicted effects recall of \(M\): \(R = \frac{1}{|A|}\sum_{a\in A} R(a)\)
- Parameters:
simulator (
Union[Env,Sequence[Env]]) – simulator of a domain model \(M\) to be evaluated.simulator_env (
Union[Env,Sequence[Env]]) – environment simulator \(E\) to compare with.test_states (
Union[Sequence[TypeVar(StateType)],Sequence[Sequence[TypeVar(StateType)]]]) – set \(S_{test}\) of test states.applicable_actions (
Sequence[TypeVar(ActionType)]) – optionally precomputed set of applicable actions for every test state.show_progress (
bool) – show a progress bar.
- Return type:
Dict[str,Dict[str,float]]- Returns:
the predicted effects mean precision and recall
- amlgym.metrics.predictive_power(simulator_learned, simulator_ref, test_states, applicable_actions=None, show_progress=True)[source]
Evaluate both the predicted applicability and predicted effects metrics of a simulated domain model \(M\) with respect to an environment simulator \(E\) against a test set of states \(S_{test}\). The results can be reproduced by separately executing functions
predicted_effects()andapplicability(); this function performs a joint and more efficient evaluation of both metrics.- Parameters:
simulator – simulator of a domain model \(M\) to be evaluated.
simulator_env – environment simulator \(E\) to compare with.
test_states (
Union[Sequence[TypeVar(StateType)],Sequence[Sequence[TypeVar(StateType)]]]) – set \(S_{test}\) of test states.applicable_actions (
Sequence[TypeVar(ActionType)]) – optionally precomputed set of applicable actions for every test state.show_progress (
bool) – show a progress bar.
- Return type:
Dict[str,Dict[str,Dict[str,float]]]- Returns:
the mean precision and recall for both predicted effects and predicted applicability
- amlgym.metrics.problem_solving(model_learn_path, model_ref_path, problem_paths, timeout=60, show_progress=True)[source]
- Solve the given problems using the model to be evaluated and return:
the solving plans ratio in the environment defined by the reference model
the false plans ratio in the environment defined by the reference model
the ratio of problems deemed unsolvable
the ratio of problems where the planning process timed out
the ratio of problems that raised syntax errors during parsing in unified planning
- Parameters:
model_learn_path (
str) – learned model pathmodel_ref_path (
str) – reference model pathproblem_paths (
List[str]) – list of problem pathstimeout – planner timeout
- Return type:
Dict[str,float]- Returns:
solving and false plans ratios, deemed unsolvable problems, timed out planning processes
- amlgym.metrics.syntactic_precision(evaluated_path, reference_path)[source]
Evaluate the syntactic precision metric of a domain model \(M\) with respect to a reference model \(M^{*}\).
- Parameters:
evaluated_path (
str) – path of the PDDL model to evaluate.reference_path (
str) – path of the PDDL reference model.
- Return type:
Dict[str,float]- Returns:
the syntactic precision grouped by preconditions and effects
- amlgym.metrics.syntactic_recall(evaluated_path, reference_path)[source]
Evaluate the syntactic recall metric of a domain model \(M\) with respect to a reference model \(M^{*}\).
- Parameters:
evaluated_path (
str) – path of the PDDL model to evaluate.reference_path (
str) – path of the PDDL reference model.
- Return type:
Dict[str,float]- Returns:
the syntactic recall grouped by preconditions and effects