View - Edit - Print - Search

Reward-based Properties

PRISM models can be augmented with information about rewards (or, equivalently, costs). The tool can analyse properties which relate to the expected values of these rewards. This is achieved using the R operator, which works in a similar fashion to the P and S operators, and can be used either in a Boolean-valued query, e.g.:

R bound [ rewardprop ]

[$[Get Code]]

where bound takes the form <r, <=r, >r or >=r for an expression r evaluating to a non-negative double, or a real-valued query, e.g.:

R query [ rewardprop ]

[$[Get Code]]

where query is =?, min=? or max=?. In the latter case, filters can be used, as for the P and S operators.

Informally, "R bound [ rewardprop ]" is true in a state of a model if "the expected reward associated with rewardprop of the model when starting from that state'' meets the bound bound and "R query [ rewardprop ]" returns the actual expected reward value.

There are various different types of reward properties:

"reachability reward": F prop
"co-safe LTL reward": e.g. F (prop1 & F prop2)
"cumulative reward" : C<=t
"total reward" : C
"instantaneous reward" : I=t
"steady-state reward" : S.

Below, we consider each of these cases in turn. The descriptions here are kept relatively informal. Precise definitions for most of these can be found in, for example, [KNP07a] (for DTMCs and CTMCs) or [FKNP11] (for MDPs).

"Reachability reward" properties

"Reachability reward" properties associate a reward with each path of a model. More specifically, they refer to the reward accumulated along a path until a certain point is reached. The manner in which rewards are accumulated depends on the model type. For DTMCs and MDPs, the total reward for a path is the sum of the state rewards for each state along the path plus the sum of the transition rewards for each transition between these states. The situation for CTMCs is similar, except that the state reward assigned to each state of the model is interpreted as the rate at which rewards are accumulated in that state, i.e. if t time units are spent in a state with state reward r, the reward accumulated in that state is r x t. Hence, the total reward for a path in a CTMC is the sum of these products for each state along the path plus the sum of the transition rewards for each transition between these states.

The reward property "F prop" corresponds to the reward cumulated along a path until a state satisfying property prop is reached, where rewards are cumulated as described above. State rewards for the prop-satisfying state reached are not included in the cumulated value. In the case where the probability of reaching a state satisfying prop is less than 1, the reward is equal to infinity.

A common application of this type of property is the case when the rewards associated with the model correspond to time. One can then state, for example:

R<=9.5 [ F z=2 ]

[$[Get Code]]

which is true in a state s if "the expected time taken to reach, from s, a state where z equals 2 is less than or equal to 9.5".

"Co-safe LTL reward" properties

These generalise the "reachability" properties above. Again, reward is accumulated along a path up until some point, but this is specified in a more general way, by giving a formula in the co-safe fragment of linear temporal logic (LTL). Rewards are accumulated up until the point where the formula is first satisfied. For example, this property, for a DTMC or CTMC, queries the expected reward accumulated until first goal equals 1 and then subsequently goal equals 2:

R=? [ F (goal=1 & F goal=2) ]

[$[Get Code]]

and this property, for an MDP, minimises the expected reward until loc equals 1, having passed only through states where loc never equals 4

Rmin=? [ loc!=4 U loc=1 ]

[$[Get Code]]

As for reachability rewards, if the probability of satisfying the formula is less than 1, then the expected reward is defined to be infinite.

Intuitively, a co-safe formula is one that is satisfied within a finite period of time, and remains true for ever once it becomes true for the first time. For simplicity, PRISM actually supports the syntactic co-safe fragment of LTL, which is defined as any LTL formula that only uses the temporal operators F, U and X (but not G, for example). PRISM's notation for LTL formulas is described here.

"Cumulative reward" properties

"Cumulative reward" properties also associate a reward with each path of a model, but only up to a given time bound. The property C<=t corresponds to the reward cumulated along a path until t time units have elapsed. For DTMCs and MDPs, the bound t must evaluate to an integer; for CTMCs, it can evaluate to double. State and transition rewards along a path are cumulated exactly as described in the previous section.

A typical application of this type of property is the following. Consider a model of a disk-drive controller which includes a queue of incoming disk requests. If we assign a reward of 1 to each transition of the model corresponding to the situation where an incoming request is lost because the queue is full, then the property:

R=? [ C<=15.5 ]

[$[Get Code]]

would return, for a given state of the model, "the expected number of lost requests within 15.5 time units of operation".

"Total reward" properties

"Total reward" properties refer to the accumulation of state and transition rewards in the same way as for "reachability reward" and "cumulative reward" properties, but the rewards is accumulated indefinitely, i.e. the total reward accumulated along the whole (infinite) path. Note that this means that, unless a path ends up remaining forever in states with zero reward, the total reward will be infinite.

Re-using the reward structure in the previous example,

R=? [ C ]

[$[Get Code]]

returns "the expected total number of lost requests".

"Instantaneous reward" properties

"Instantaneous reward" properties refer to the reward of a model at a particular instant in time. The reward property I=t associates with a path the reward in the state of that path when exactly t time units have elapsed. For DTMCs and MDPs, the bound t must evaluate to an integer; for CTMCs, it can evaluate to double.

Returning to our example from the previous section of a model for a disk-request queue in a disk-drive controller, consider the case where the rewards assigned to each state of the model give the current size of the queue in that state. Then, the following property:

R<4.4 [ I=100 ]

[$[Get Code]]

would be true in a state s of the model if "starting from s, the expected queue size exactly 100 time units later is less than 4.4". Note that, for this type of reward property, state rewards for CTMCs do not have to refer to rates; they can refer to any instantaneous measure of interest for a state.

"Steady-state reward" properties

Unlike the previous three types of property, "steady-state reward" properties relate not to paths, but rather to the reward in the long-run. A typical application of this type of property would be, in the case where the rewards associated with the model correspond to power consumption, the property:

R<=0.7 [ S ]

[$[Get Code]]

which is true in a state s if "starting from s, the long-run average power consumption is less than 0.7".

Which reward structure?

In the case where a PRISM model has multiple reward structures you may need to specify which reward structure your property refers to. This is done by placing the information in braces ({}) after the R operator. You can do so either using the name assigned to a reward structure (if any) or using the index (where 1 means the first rewards structure in the PRISM model file, 2 the second, etc.). Examples are:

R{"num_failures"}=? [ C<=10.0 ]

R{"time"}=? [ F step=final ]

R{2}=? [ F step=final ]

[$[Get Code]]

Note that when using an index to specify the reward structure, you can actually put any expression that evaluates to an integer. This allows you to, for example, write a property of the form R{c}=?[...] where c is an undefined integer constant. You can then vary the value of c in an experiment and compute values for several different reward structures at once.

If you don't specify a reward structure to the R operator, by default, the first one in the model file is used.

Availability

There are currently a few restrictions on the model checking engines that can be used for some reward properties. The following table summarises the currently availability, where S, M, H and E denote the "sparse", "MTBDD", "hybrid" and "explicit" engines, respectively, for DTMCs, CTMCs and MDPs. For PTAs, support for rewards is currently quite restrictive; see the later section on real-time model properties for details.

	`F`	cosafe	`C<=t`	`C`	`I=t`	`S`
DTMCs	SMHE	SMHE	SMHE	SMHE	SMHE	SMHE
CTMCs	SMHE	SMHE	SMHE	SMHE	SMHE	SMHE
MDPs	SM-E	SMHE	S--E	----	SM-E	----

Reward-based Properties

"Reachability reward" properties

"Co-safe LTL reward" properties

"Cumulative reward" properties

"Total reward" properties

"Instantaneous reward" properties

"Steady-state reward" properties

Which reward structure?

Availability

PRISM Manual