Predictive power

Predictive power metrics assess a domain model ability of predicting actions applicability and outcomes of executing an action in a state. The predicted applicability evaluates the model ability of predicting whether and action is applicable in an environment state. The predicted effects evaluates the model ability of predicting the state reached after executing an action in an environment state. Predictive power metrics are defined with respect to a set of (test) states and a simulator, which is required to evaluate if an action is applicable and what is the state reached after executing the action in a given (simulated) environment state.

amlgym.metrics._predictive.applicability(simulator, simulator_ref, test_states, applicable_actions=None, show_progress=True)[source]

Evaluate the predicted applicability metric given the simulator of a domain model \(M\) and an environment simulator \(E\). The model \(M\) and environment simulators share the set \(S\) of states and the set \(A\) of actions. For an action \(a\in A\), and state \(s\in S\), we denote by \(app_M(a,S_{test})\) and \(app(a,S_{test})\) the set of states in \(S_{test}\) in which \(a\) is applicable according to \(M\) and \(E\), respectively. We define the predicted applicability metric for every action \(a\in A\) as:

True Positives: \(TP_{app}(a)=|app_M(a,S_{test})\cap app(a,S_{test})|\)
False Positives: \(FP_{app}(a)=|app_M(a,S_{test})\setminus app(a,S_{test})|\)
False Negatives: \(FN_{app}(a)=|app(a,S_{test}) \setminus app_M(a,S_{test})|\)

The predicted applicability precision and recall per action are obtained as:

Predicted applicability precision of \(a\): \(P_{app}(a) = \frac{TP_{app}(a)}{TP_{app}(a)+FP_{app}(a)}\)
Predicted applicability recall of \(a\): \(R_{app}(a) = \frac{TP_{app}(a)}{TP_{app}(a)+FN_{app}(a)}\)

When \(TP_{app}(a) = FP_{app}(a) = 0\), we define \(P_{app}(a)=1\) and \(R_{app}(a)=0\), as the domain model \(M\) never allows \(a\) to be applied in \(S_{test}\). Finally, the mean precision and recall of a domain model averages over the actions precision and recall:

Predicted applicability precision of \(M\): \(P = \frac{1}{|A|}\sum_{a\in A} P(a)\)
Predicted applicability recall of \(M\): \(R = \frac{1}{|A|}\sum_{a\in A} R(a)\)

Parameters:

simulator (Union[Env, Sequence[Env]]) – simulator of a domain model \(M\) to be evaluated.
simulator_env – environment simulator \(E\) to compare with.
test_states (Union[Sequence[TypeVar(StateType)], Sequence[Sequence[TypeVar(StateType)]]]) – set \(S_{test}\) of test states.
applicable_actions (Sequence[TypeVar(ActionType)]) – optionally precomputed set of applicable actions for every test state.
show_progress (bool) – show a progress bar.

Return type:

Dict[str, Dict[str, float]]

Returns:

the predicted applicability precision and recall averaged over all test states and actions.

amlgym.metrics._predictive.predicted_effects(simulator, simulator_env, test_states, applicable_actions=None, show_progress=True)[source]

Evaluate the predicted effects metric given the simulator of a domain model \(M\) and an environment simulator \(E\). The model \(M\) and environment simulators share the set \(S\) of states and the set \(A\) of actions; the evaluation considers actions applicable in a state for both \(M\) and \(E\). For an action \(a\in A\), and state \(s\in S\), we denote by \(a_{M}(s)\) and \(a(s)\) the state resulting from applying \(a\) in \(s\) according to \(M\) and \(E\), respectively. We define the predicted effect metrics for every state \(s \in S_{test}\) and action \(a\in A\) as:

True Positives: \(TP_{eff}(s,a)=|(a_M(s)\setminus s)\cap (a(s)\setminus s)|\)
False Positives: \(FP_{eff}(s,a)=|(a_M(s)\setminus s)\setminus a(s)|\)
False Negatives: \(FN_{eff}(s,a)=|(a_M(s)\cap s)\setminus a(s)|\)

The predicted effects mean precision and recall per action are obtained by averaging over all states in \(S_{test}\), i.e.:

True Positives: \(TP_{eff}(a)=\sum\limits_{s\in S_{test}}TP_{eff}(s,a)\)
False Positives: \(FP_{eff}(a)=\sum\limits_{s\in S_{test}}TP_{eff}(s,a)\)
False Negatives: \(FN_{eff}(a)=\sum\limits_{s\in S_{test}}FN_{eff}(s,a)\)
Predicted effects precision of \(a\) : \(P_{eff}(a) = \frac{TP_{eff}(a)}{TP_{eff}(a)+FP_{eff}(a)}\)
Predicted effects recall of \(a\) : \(R_{eff}(a) = \frac{TP_{eff}(a)}{TP_{eff}(a)+FN_{eff}(a)}\)

When \(TP_{eff}(a) = FP_{eff}(a) = 0\), we define \(P_{eff}(a)=1\) and \(R_{eff}(a)=0\). Finally, the mean precision and recall of a domain model averages over the actions precision and recall:

Predicted effects precision of \(M\): \(P = \frac{1}{|A|}\sum_{a\in A} P(a)\)
Predicted effects recall of \(M\): \(R = \frac{1}{|A|}\sum_{a\in A} R(a)\)

Parameters:

simulator (Union[Env, Sequence[Env]]) – simulator of a domain model \(M\) to be evaluated.
simulator_env (Union[Env, Sequence[Env]]) – environment simulator \(E\) to compare with.
test_states (Union[Sequence[TypeVar(StateType)], Sequence[Sequence[TypeVar(StateType)]]]) – set \(S_{test}\) of test states.
applicable_actions (Sequence[TypeVar(ActionType)]) – optionally precomputed set of applicable actions for every test state.
show_progress (bool) – show a progress bar.

Return type:

Dict[str, Dict[str, float]]

Returns:

the predicted effects mean precision and recall

amlgym.metrics._predictive.predictive_power(simulator_learned, simulator_ref, test_states, applicable_actions=None, show_progress=True)[source]

Evaluate both the predicted applicability and predicted effects metrics of a simulated domain model \(M\) with respect to an environment simulator \(E\) against a test set of states \(S_{test}\). The results can be reproduced by separately executing functions predicted_effects() and applicability(); this function performs a joint and more efficient evaluation of both metrics.

Parameters:

simulator – simulator of a domain model \(M\) to be evaluated.
simulator_env – environment simulator \(E\) to compare with.
test_states (Union[Sequence[TypeVar(StateType)], Sequence[Sequence[TypeVar(StateType)]]]) – set \(S_{test}\) of test states.
applicable_actions (Sequence[TypeVar(ActionType)]) – optionally precomputed set of applicable actions for every test state.
show_progress (bool) – show a progress bar.

Return type:

Dict[str, Dict[str, Dict[str, float]]]

Returns:

the mean precision and recall for both predicted effects and predicted applicability