www.prismmodelchecker.org
HomeAboutDownloadsDocumentationManualPublicationsCase StudiesSupport

Randomised Consensus Shared Coin Protocol

(Aspnes & Herlihy)

Contents

Introduction

This case study concerns modelling and verifying the shared coin protocol of the randomised consensus algorithm of Aspnes and Herlihy [AH90] using PRISM. The shared coin protocol returns a preference 1 or 2, with a certain probability, whenever requested by a process at some point in the execution of the consensus algorithm. The shared coin protocol implements a collective random walk parameterised by the number of processes N and the constant K > 1 (independent of N). The consensus algorithm proceeds through rounds and for each round r there is a new copy of the shared coin protocol denoted CFr.

The processes access a global shared counter, initially 0. On entering the protocol, a process flips a coin, and, depending on the outcome, increments or decrements the shared counter. Since we are working in a distributed scenario, several processes may simultaneously want to flip a coin, which is modelled as a nondeterministic choice between probability distributions, one for each coin flip. Having flipped the coin, the process then reads the counter, say observing c. If c >= KN it chooses 1 as its preferred value, and if c <= -KN it chooses 2. Otherwise, the process flips the coin again, and continues doing so until it observes that the counter has passed one of the barriers.

There are two properties of the shared coin protocol we need to verify to prove Probabilistic wait free termination.

  • C1: For each fair execution of CFr that starts with a reachable state of CFr, with probability 1 all processes that enter CFr leave CFr.
  • C2: For each fair execution of CFr, and each value v=1,2, the probability that all processes that enter CFr will eventually leave with value v is at least p.

In the case of C2, calculate the exact (minimum) probability of the processes drawing the same value, as opposed to a lower bound p=(K-1)/2K established analytically in [AH90] using random walk theory.

The PRISM source code in the case when there are 8 processes is given below.

// COIN FLIPPING PROTOCOL FOR POLYNOMIAL RANDOMISED CONSENSUS [AH90] 
// 8 processes
// gxn/dxp 20/11/00

mdp

// CONSTANTS
const int N=8; //number of processes
const int K; // parameter
const int range = 2*(K+1)*N; // range of sharded coin
const int counter_init = (K+1)*N; // initial value
const int left =2; // choose tails
const int right= 2*(K+1)*N -N; // choose heads

// shared coin
global counter : [0..range] init counter_init;

module process1
	
	// program counter
	pc1 : [0..3];
	// 0 - flip
	// 1 - write 
	// 2 - check
	// 3 - finished
	
	// local coin
	coin1 : [0..1];	

	// flip coin
	[] (pc1=0)  -> 0.5 : (coin1'=0) & (pc1'=1) + 0.5 : (coin1'=1) & (pc1'=1);
	// write tails -1  (and reset local coin)
	[] (pc1=1) & (coin1=0) & (counter>0) -> (counter'=counter-1) & (pc1'=2) & (coin1'=0);
	// write heads +1 (and reset local coin)
	[] (pc1=1) & (coin1=1) & (counter<range) -> (counter'=counter+1) & (pc1'=2) & (coin1'=0);
	// check
	[] (pc1=2) & (counter<=left)  -> (pc1'=3) & (coin1'=0);	// decide tails
	[] (pc1=2) & (counter>=right) -> (pc1'=3) & (coin1'=1);	// decide heads
	[] (pc1=2) & (counter>left) & (counter<right) -> (pc1'=0); // flip again
	// loop when finished
	[] (pc1=3) -> (pc1'=3);

endmodule

// construct remaining processes through renaming
module process2 = process1[pc1=pc2,coin1=coin2] endmodule
module process3 = process1[pc1=pc3,coin1=coin3] endmodule
module process4 = process1[pc1=pc4,coin1=coin4] endmodule
module process5 = process1[pc1=pc5,coin1=coin5] endmodule
module process6 = process1[pc1=pc6,coin1=coin6] endmodule
module process7 = process1[pc1=pc7,coin1=coin7] endmodule
module process8 = process1[pc1=pc8,coin1=coin8] endmodule
View: printable version          Download: coin8.nm

All experiments were run on a 440 MHz SUN Ultra 10 workstation with 512 Mb memory under the Solaris 2.7 operating system.

Model and MTBDD Sizes:

The table below shows statistics for each model and the MTBDD which represents it. The first column gives the size of the model (i.e. the number of states). The second two columns refer to the MTBDD: the total number of nodes in the structure and the number of leaves (or terminal nodes).

Model:   States: Nodes: Leaves:
N K
222722903
45283013
81,0403123
162,0643233
324,1123343
648,2083453
4222,6561,2943
443,1361,3053
884,0961,3163
16166,0161,3273
32329,8561,3383
621,258,2403,1343
4 2,376,4483,1313
8 4,612,8643,1423
169,085,6963,1533
8261,018,1125,3713
4114,757,6325,3823
8222,236,6725,5873
16437,194,7525,4043
1022,761,248,7688,6913
45,179,854,8488,7133
67,598,460,9288,7123

Model Construction Times:

The table below gives the times taken to construct the models. There are two stages. Firstly, the system description (in our module based language) is parsed and converted to the MTBDD representing it. Secondly, reachability is performed to identify non-reachable states and the MTBDD is filtered accordingly. Reachability is performed using a BDD based fixpoint algorithm. The number of iterations for this process is also given.

Model:   Construction:
N K Time (s):   Iter.s:
220.15837
40.18773
80.224145
160.317289
320.581577
641.111,153
420.36899
40.503195
80.845387
161.44771
322.751,539
6 2 1.36149
4 1.94293
8 3.18581
165.501,157
823.03199
44.37391
87.50775
1613.21,543
1026.80249
410.1489
612.6729

Model Checking

Requirements. when model checking, we have the following requirement:

  • Scheduler-Liveness: For each time point t and process i, there exists a time point t' > t at which process i is scheduled.

This corresponds to the conditions on the adversary/scheduler which we impose by model checking against only fair adversaries.

Model Checking Algorithms.There are two precomputation steps in the model checking procedure involving BDD based fixpoint algorithms: Prob1E and Prob0A which finds those states such that the maximum probability of satisfying the formula is 1 and 0 respectively.

In addition, in the main algorithm we stop iterating when the relative error between subsequent iteration vectors is less than 1e-6, that is:

Properties. We have model checked the following properties:

  • C1: For each fair execution of CFr that starts with a reachable state of CFr, with probability 1 all processes that enter CFr leave CFr.

    This is expressed in PCTL for 8 processes by:

    P>=1 [ true U pc1=3 & pc2=3 & pc3=3 & pc4=3 & pc5=3 & pc6=3 & pc7=3 & pc8=3 ]
    
    View: printable version          Download: coin8_c1.pctl

    This is a probability 1 property, and therefore the solution is found through a probability-1 precomputation step.

    Model checking times:

    Model:   Precomputation:
    N K Time (s): Iterations:
    2 2 0.0217
    4 0.0329
    8 0.0453
    160.15197
    320.13101
    640.31389
    4 2 0.4033
    4 0.6057
    8 0.57105
    160.82201
    321.64393
    6 2 1.6849
    4 2.0685
    8 2.93157
    164.21301
    8 2 6.44 65
    4 7.93 113
    8 10.26209
    1615.05401
    10 218.1281
    418.20141
    620.39201

    Conclusion: the property holds in all states
  • C2: For each fair execution of CFr, and each value v=1,2, the probability that all processes that enter CFr will eventually leave with value v is at least p.

    In this case we calculate the actual minimum probability, i.e. the minimum value of p for which C2 holds. This query, in the case of 8 processes, can be expressed in PCTL as follows:

    Pmin=? [ true U pc1=3 & pc2=3 & pc3=3 & pc4=3 & pc5=3 & pc6=3 & pc7=3 & pc8=3 & coin1=1 & coin2=1 & coin3=1 & coin4=1 & coin5=1 & coin6=1 & coin7=1 & coin8=1 ]
    
    View: printable version          Download: coin8_c2.pctl

    The tables below give the model checking results for the MTBDD, Sparse and Hybrid engines. This shows the minimum value of p and the lower bound (K-1)/2K, established analytically in [AH90] using random walk theory.

    Model checking times:

    Model:   Precomputation (Prob0A):   Main Algorithm:   p:   (K-1)/2K:
    N K Time (s): Iterations: Time per iter. (s): Iterations:
    MTBDD Sparse Hybrid
    2 2 0.03429 0.0040 0.000040.00019480 0.3828140.25
    4 0.13353 0.0087 0.000120.000371,740 0.4377520.3125
    8 0.0931010.0186 0.000250.000636,132 0.4687830.4375
    16 0.4271970.0387 0.000520.001221,108 0.4845070.46875
    32 0.1783890.1242 0.0011 0.002670,668 0.4927150.484375
    64 0.3857730.0752 0.0021 0.0044227,6630.4981930.492188
    4 20.32 63 0.084 0.0080.0161,741 0.3173600.25
    40.42 1110.173 0.0140.0296,133 0.4063070.3125
    80.50 2070.343 0.0410.05721,110 0.4532550.4375
    160.893990.670 0.1020.12770,669 0.4770880.46875
    321.707831.11 0.2120.345227,6650.4903790.484375
    6 2 1.1997 0.5350.9421.603,649 0.2943650.25
    4 1.591690.9881.74 3.0112,6790.3959060.3125
    8 2.493131.83 3.37 5.8042,9670.4482090.4375
    164.926012.90 - 11.72140,8870.4751380.46875
    8 2 3.71 1311.87--6,133 0.2827900.25
    4 4.98 2273.18--21,110 0.3907490.3125
    8 7.93 4194.56--70,669 0.4458310.4375
    1613.168036.91--227,665 0.4747470.46875
    10 29.162 1658.38--9,157 0.2759190.25
    411.0 2858.60--31,2550.3876930.3125
    612.2784059.79--63,2410.4254500.416667

Case Studies