Научная статья на тему 'OPTIMAL DOUBLING STRATEGIES IN BACKGAMMON'

OPTIMAL DOUBLING STRATEGIES IN BACKGAMMON Текст научной статьи по специальности «Математика»

CC BY
9
3
i Надоели баннеры? Вы всегда можете отключить рекламу.
Область наук
Ключевые слова
BACKGAMMON / DECISION MAKING / DOUBLING STRATEGIES

Аннотация научной статьи по математике, автор научной работы — Dotsenko Sergei I., Marynych Alexander V.

In this paper we discuss decision procedures related to an important aspect of the backgammon game, namely a doubling. We focus on proper choice of the doubling time and optimal strategies on whether to accept or reject the doubling proposed by the opponent. There are two stages of the game that are most amenable to the analysis. The first one is called “the races”, during this stage opponent’s checkers do not block player’s moves, the goal is to put all checkers into “the house” and then to take them off the board. In this case we calculate optimal doubling time as well as the optimal strategy for acceptance and rejection. Another case is the so-called two-steps game, the situation in which each player has at most one turn before the game ends. This situation is analyzed using the concept of complex rational behavior.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «OPTIMAL DOUBLING STRATEGIES IN BACKGAMMON»

Contributions to Game Theory and Management, VIII, 33—46

Optimal Doubling Strategies in Backgammon

Sergei I. Dotsenko and Alexander V. Marynych

Taras Shevchenko National University of Kyiv, Faculty of Cybernetics, Volodymyrska 64/13, Kyiv 01601, Ukraine E-mail: sergei204@ukr.net, marynych@unicyb.kiev.ua

Abstract In this paper we discuss decision procedures related to an important aspect of the backgammon game, namely a doubling. We focus on proper choice of the doubling time and optimal strategies on whether to accept or reject the doubling proposed by the opponent. There are two stages of the game that are most amenable to the analysis. The first one is called "the races", during this stage opponent's checkers do not block player's moves, the goal is to put all checkers into "the house" and then to take them off the board. In this case we calculate optimal doubling time as well as the optimal strategy for acceptance and rejection. Another case is the so-called two-steps game, the situation in which each player has at most one turn before the game ends. This situation is analyzed using the concept of complex rational behavior.

Keywords: backgammon, decision making, doubling strategies.

1. Introduction

Backgammon is one of the most popular commercial board games of both luck and skill. The game is played at the special board with 24 fields and each player has 15 checkers at his disposal. The players alternately roll two dices and make their moves according to points on dices. There are many variations of a game on the backgammon board (backgammon, nackgammon, long gammon, tavla, etc.) and the rules may differ considerably. Probably the most interesting and the most widespread is backgammon itself. The strategy of the game may be briefly described as follows. Each player tries to gather all his checkers at the quadrant of the board, called "the house" (each player has his own house and the houses are situated on the opposite sides of the board). As checkers are moved into the house, the player tries to create obstacles for the opponent's checkers and to avoid the opponent's obstacles. The final stage of the game when opponent's checkers do not prevent player's moves consists of two parts, namely moving the checkers into the house and "taking them off". The player who takes all of his checkers out of the board before his opponent, wins the game. This final stage, which is called "the races" is the most amenable to probabilistic analysis.

2. The basic concepts of the game 2.1. Pip-count (PC)

Let the fields on the board be enumerated from 1 to 24 in such a way that the rightmost field at the first player house is 1 and the field in the opposite corner of the board is 24. The sum of products of the number of checkers by the number of field is called the first player's pip-count. To calculate the second player's pip-count

fields must be enumerated in the opposite order. At the beginning of the game each player's pip-count always equals

2 x 24 + 5 x 13 + 3 x 8 + 5 x 6 = 167.

In fact, the pip-count gives the minimal number of points, which player has to roll (in few turns) in order to put all of his checkers into the house and then to take them off. A player whose pip-count is smaller/larger than his opponent's is said to be ahead/behind at the races. The ability to calculate and compare pip-counts quickly is an important part of player's skill. Some technical tricks for speeding up such calculations may be found in (Kissane, 1992).

2.2. Roll

Each player's turn starts with two dices roll. If both dices show the same points (for example 3 — 3) this roll is called "double" and the number of points is doubled (in the example above, the player has to do 4 rather than 2 3-point checker moves). The probability distribution of points at a roll, taking into account the doubles, is given in Table 1.

Table 1: Probability distribution of points at a roll.

j 3 4 5 6 7 8 9 10 11 12 16 20 24

2 36 3 36 4 36 4 36 6 36 5 36 4 36 2 36 2 36 1 36 1 36 1 36 1 36

The mean and the standard deviation of this distribution are given by

49 V665 , s

M := "7T « 8.167, a := « 4.297. (1)

66

2.3. Doubling cube

A doubling of the bet is an additional rule in backgammon which works as follows. A player who has a good chances to win (in his subjective opinion) has a right to double the bet before his roll. As the double is called, the opponent has to make either one of two decisions: to surrender and lose the single bet or to continue the game with the double bet. It may be a few bet doubles during one game. Double-cube is a dice with powers of 2 on its faces, from 2 to 64, and it is used as a doubling indicator as follows. At the very beginning while the game is played at initial bet, the doubling cube lies between the players and such situation is called "our cube". If a player has accepted a doubling proposal, then the cube is moved to him and the top face indicates the current bet. This player is called "the cube owner" and this situation is called "my cube" for him. In "my cube" situation only the cube owner has the right to double before his roll.

3. Continuous time model

It turns out that the proper doubling strategy, i.e. a decision procedure about whether to double or not and whether to accept or surrender after the opponent has doubled, is as important as the proper moves strategy. The problem of decision making related to a doubling cube was considered in (Keeler and Spencer, 1975)

where the following continuous time model of a game with equally skilled opponents A and B was considered. Let (W(x)(t))t>0 be the standard one-dimensional Brownian motion starting at x € [0,1], i.e. W(x)(0) = x. The players observe a trajectory W(x)0) before it hits either 0 or 1 and the current value W(x)(t) represents

(x)

the winning chances of player A at time t. More precisely, denote by Ta 1 the first hitting time of the line y = a,

t(x) := inf{t > 0 : W(x)(t) = a}

and set1 Tm := T)X) A t(x). The random variable Tm is the total time of the game

and player A wins if W(x) (tOx1) = 1 and loses if W(x)(t0(x1) ) = 0. In view of the first Wald's identity (see Theorem 2.45 in (Morters and Peres, 2010))

P{A wins} = P{W(x)(t0Xl)) = 1} = P{t0x) > t1x)} = x.

Moreover, as was mentioned above, for every t < tO X1 the current value W(x)(t) is equal to the conditional probability that player A wins given (W(x)(s))o<s<t (or simply W(x)(t)).

The following lemma was proved in (Keeler and Spencer, 1975), we will use it in the sequel.

Lemma 1. Let a, b > 0 be such that 0 < x — a<x<x + b < 1, set

E := JV(x) > t(x) 1

E := {Tx-a > Tx+b},

so E is the event that W(x)(t) hits x + b before it hits x — a, then P{E} = a/(a + b). Proof. We have

x = P{t0x) > T1(X)} = P{t0x) > T1(X) iTX-a > tx+Up{E}

+ P{t0x) > T1(X)iTX-a < tx+U(1 — P{E})

and in view of the strong Markov property of the Brownian motion

TO r (x) (x)l (x) (x) 1 I I. U-D r (xK (x)| (x) ^ (x)

P{t0 1 > t1 1 lT(-a > Tq+6} = x + b P{t0 1 > T1 )|T(-)a < Ts+U = x — a. Hence

x = (x + b)P{E} + (x — a)(1 — P{E}), which yields P{E} = a/(a + b). The proof is complete.

Let's assume that both players can double the bet as they observe the trajectory of W(x)(t). It turns out (see Theorem 1 in (Keeler and Spencer, 1975)) that player A should double precisely at tO3 or, in other words, as soon as W(x)(t) hits the level 0.8. More precisely, given that player A has doubled at t1, player B has to accept the double if W(x)(t1) < 0.8 and to reject if W(x)(t1) > 0.8. If player A

(x)

has doubled exactly at tq g1 it is irrelevant whether to accept or reject, the second player will lose the current bet on average anyway. Since the model is symmetric, the doubling point for B equals Tq3^ .

1 Throughout the paper we use x A y (respectively, x V y) to denote the minimum (respec-

tively, maximum) of x,y £ R.

Theorem 1. Assume that x = 0.5 and both players are rational but risky which means that player A (respectively, B) doubles at 0.8 (respectively, 0.2) and accepts the double at 0.2 (respectively, 0.8). Denote by Ca (respectively, Cb) the winning of

player A (respectively, B), then2 Ca = Cb and

31

P{Ca = 2k} = P{Ca = —2k} = g4k-T, k G N. (2)

Proof. Note that for every x, c G (0,1), in view of the strong Markov property of the Brownian motion, the post-Tc(x) process (W(x)(rc(x) + t))t>o is independent of (W(x)(i))tcfn (*), and also (W(x)(rc(x) + t))t>0 = (W(c)(t))t>0. Using this observation, formula (2) for k = 1 can be checked as follows. Since3

{CA = 2} = {W(05)(t) hits 0.8 before it hits 0.2 and then it hits 1 before 0.2},

we have

P{CA = 2} = P{W(0'5)(t) hits 0.8 before it hits 0.2}

1 3 3

x P{W(08)(t) hits 1 before it hits 0.2} = - ■ - = ^, where the penultimate equality follows from Lemma 1. Likewise, for k = 2,

{CA = 4} = {W(0 5)(t) hits 0.2 before it hits 0.8 then it hits 0.8 before

it hits 0 and finally hits 1 before it hits 0.2}

and therefore, by Lemma 1,

P{Ca = 4} = P{W(0'5)(t) hits 0.2 before it hits 0.8} x P{W(0 2)(t) hits 0.8 before it hits 0}

113 3 1

x P{W(0 8)(t) hits 1 before it hits 0.2} =-----=---.

1 w J 2 4 4 8 4

The same arguments with the decomposition into independent events leads to the general formula (2) for all k G N. The details are left to the reader. Since Ca+CB = 0

and Ca = CB (the game is symmetric) we have Ca = —Ca and therefore (2) holds also for negative values of Ca . The proof is complete.

Let (Ca) )ieN be a sequence of independent copies of Ca and set sA^ := CA^ + C^2) + ... + Ca ), that is is the total winnings of A after n games. Obviously ECa = 0, EC2 = ^ and in particular Ca has infinite variance. However, as the next theorem shows, it is still possible to obtain the central limit theorem for .

Theorem 2. As n ^ to we have

(n)

p( SA— < xl ^ <£(x), x G R I an J

where an = y^3(n log2 n)/2 and is the cumulative distribution function of the standard normal law.

2 d Throughout the paper we use symbol = to denote equality in distribution.

3 Note that event in the rhs is simply {W(0'B) (t) hits 1 before it hits 0.2} but we prefer

to write it this way for the reason which will become clear shortly.

Proof. Introduce the truncated second moment of as follows

«(x) : = E£a 1 {|£a | <x}, x > 0.

According to part b) of Theorem 2 in Section 5, Chapter XVII in (Feller, 1970) it is enough to show that «(•) is a slowly varying function at to, i.e.

«(Ax) , ,

lim ^ = 1, (3)

x^TO «(x)

for every fixed A > 0. We have

«(x)=2Eei 1{1<?A<x} = 2 £ (2k)2P{£a = 2k} = 3 £ 1 = 3[log2 x],

fc:fc>1,2fc<x fc:fc>1,2fc <x

where [x] denotes the integral part of x, [x] := sup{y € Z : y < x}. Hence for every A > 0 and x large enough

log2 Ax — 1 «(Ax) log2 Ax

log2 x «(x) log2 x — 1'

Sending x ^ to yields (3). Thus, distribution of belongs to the domain of

(n)

attraction of the normal law. Since £a is symmetric around 0 no centering for SA 1 is needed. To find normalizing constants an we apply formula (5.23) on p. 579 in (Feller, 1970), which reads as

n«(a„) lim -2— = 1,

n^TO an

or, equivalently,

n log 2 an 1 lim -2- = -.

n^TO an 3

Direct calculation shows that an = -y/3(n log2 n)/2 satisfies the above relation. The proof is complete.

3.1. Playing against the "insolent" opponent

Let A be a rational player who always doubles when he has a cube and his winning probability is 0. 8 and let B be an insolent player in the meaning that he always takes A's double and doubles below the level 0.2, namely, at the level 0.2 +1, where t is a parameter with values in [0, 0.3).

Using the same arguments as in the proof of Theorem 1 it can be checked that the distribution of the winning of A is given by

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

1 A 1 0.8 - tV4 0.8 - t/ 1 A 1 8 0.8 - tV4 0.8 - t)

Pfâ = -22k+i} = 3 ( 102+1 )k, P{eA = -22k} = 1 0.3-t , 102±t ^,

1 A 1 8V40.8-1) 1 A 1 4 0.8 — tV4 0.8 — t)

where k € N. The expectation of £A equals E£A = 5 0 3_t, in particular, it is positive for t > 0. '

3.2. Playing against "excessively cautious" opponent

As in the previous case, assume that A is a rational player who always doubles when he has a cube and his winning probability is 0.8 and let B be excessively cautious player in the meaning, that he doubles below the level 0.2, namely, at the level 0.2 — t, where t is a parameter taking values in (0, 0.2].

Since A acts rationally he always rejects B's double (since B doubles below 0.2) and therefore the distribution of the winning of A depends completely on the behavior of B at the doubling point of A. Consider two types of such behavior:

(R) B always rejects A's double; (A) B always accepts A's double.

Denote by Ca,r (respectively, Ca,a) the winning of A in case (R) (respectively, in case (A)). The distribution of Ca,r can be easily calculated, since this random variable can only take two values: ±1. Clearly,

P{CA,R = 1} = P{W(0'5)(t) hits 0.8 before it hits 0.2 — t} = 0f:+t.

in view of Lemma 1. Therefore,

0.3 +1 , 0.3

P{CA,R = 1} = ^, P{Ca,r = —1} = ,

in particular ECa,r = 0 6t+t. In the second case (A) Ca,a can take three values ±2, —1 with probabilities

0.3 0.3 + t

P{Ca,a = —1} = 777"^, P{Ca,a = 2} +

P{Ca,a = —2}

0.6 + t' lsa,a ~ J - 0.8 + t

0.2(0.3 +1) (0.6 + t)(0.8 +1).

Since {Ca,a = —1} = {Ca,r = —1} and these three probabilities must sum up to one, it is enough to check the second equality. We have

P{CA,A = 2} = P{W(0 5)(x) hits 0.8 before it hits 0.2 — t and

than it hits 1 before it hits 0.2 — t} = P{W(05) (x) hits 0.8 before it hits 0.2 — t}

xP{W(0 8) hits 1 before it hits 0.2 — t}

= 0.3 +1 0.6 + t = 0.3 + t = 0.6 +1 ' 0.8 + t = 0.8 + t.

4. One-step and two-steps game models

The further analysis shows that although the aforementioned "continuous trajectory" model plays an important role in the doubling point evaluation, it is not quite adequate, since the chances of each player to win may change radically within one roll. The most frequently such situations appear in the very end of the game. Such terminal situation are described by the so-called one-step and two-step game models (see (Tuck, 1980)).

The probabilities of taking one and two checkers off at the end of the game are presented in Tables 2 and 3, respectively.

Table 2: Chances (out of 36) to take one checker off.

Position 1 2 3 4 5 6

Probability 36 36 36 34 31 27

Table 3: Chances (out of 36) to take two checkers off.

Positions 1 2 3 4 5 6

1 36 36 34 29 23 15

2 36 26 25 23 19 13

3 34 25 17 17 14 10

4 29 23 17 11 10 8

5 23 19 14 10 6 6

6 15 13 10 8 6 4

4.1. One-step game

Let A wins with probability p and loses with the complimentary probability within one roll. Such model is completely adequate to real situation, when the opponent has only one or two checkers, which will be taken out for sure during his turn. For example, this will be the case if there is only one checker left and it is situated either at position 1, 2 or 3 or there are two checkers at positions 1 — 1 or 1 — 2.

It is clear that it is reasonable for A to double if the expected winning increases, which holds if 4p — 2 > 2p — 1 and hence A has to double at one-step game if p > 0.5. Analogously, B has to surrender if by accepting the double he loses more, i.e. if 4p — 2 > 1 or p > 0.75. These conclusions are true both in "my cube" and "our cube" situations.

The application of the aforementioned results to the backgammon yields easy rules. At one-step game A has to double, if he has one checker at any position or two checkers at position 2 — 5 or better. B has to surrender, if A has 1 checker at any position (by the way, position 6 corresponds to p = 0.75 and it is the point of indifference) or two checkers at position 1 — 4 or better. Otherwise, B has to accept the double. For example, if A is at position 2 — 5, his probability to win is 19/36 « 0.528 (see Table 3). So, he has to double and B has to accept this double. Such "heightened aggressiveness" of A can be explained as follows. Although A's chances to win are only slightly larger than 0.5, his is in situation "to double now or never" since there will be no opportunity to double later.

4.2. Two-steps game

Assume that A wins at his roll with probability p and with probability 1 — p turn goes to B. During his turn B makes the roll and wins with (conditional) probability q, otherwise A wins. Then, as it was shown in (Tuck, 1980), the points of proper doubling and accepting a double can be represented as sub-domains of the unit square [0,1] x [0,1] in (p, q)-plane. Define the following curves

2, 4 <q < 1, f 3-i, 3 < q < 1,

Pd(9) = < fc^ 2 <q< 4, PR(q)= ip, 1 <q< 4, . 0 q < 2; 10, q < 1;

and

PQ(q)

Sq-3 8q-2 ,

1 -è. 0,

3 <q < 1,

2 < q < 4 > 1 < q < 1 >

4

q <

1.

4 '

These curves are depicted in Figure 1. With this picture at hand, the proper strategy of A may be described as follows: in the area to the "north-west" from PD (q) A should not double in any case; in the area between PD (q) and PR (q) should double at "our cube" and not in "my cube" situation. in the area to the "south-east" from PR(q) A should always double.

J

q=P(B wint)

| Player A doubles only In "our cuhc1 situation |

0,75 ■ Mayer A newer doubles P-PDW /V

0,5 ■

Player A doubles jf both in "my cube" and "aur cube" sliinrla/is^i^ Playef E Accepts,^""1^

0,25 ■ ~ Player A doubies both in "my tube" and "our tube" situations, Player B surrenders

0 0,2S 0,5 0,75 a-P[A wins]

3

Fig. 1: Optimal doubling strategies in two-steps game

The proper strategy of Player B is simpler: double should be accepted if (p, q) lies to the "north-west" from Pg(q) and should be rejected otherwise.

More details about curves PD(q), PR(q) and Pg(q) can be found in (Tuck, 1980).

At a first glance, it may seem that if p increases whilst q decreases the reason for player A to double should rise. In other words, if some point (p0, qo) lies in the doubling area of A then all points of the rectangle spanned on (p0, q0), (p0, 0), (1,0) and (1,qo) should also lie in his doubling area. But, as is readily seen from the picture, it is not always the case: it is true in "our cube" situation but not always in "my cube" case.

As a counterexample let us compare three situations corresponding to different segments of the vertical line p = 19/36 in (p, q)-plane. In all cases it is assumed that

A had got the cube before, so the game is played with bet 2 and the situation is "my cube" for A.

a) Let A has two checkers at positions 2 and 5, and B has one checker at position 1. This situation is described by the point A(19/36,1) (see Tables 2 and 3). This point lies in the area between the curves PR(q) and Pq(q), so A has to double and B has to accept. If A has not doubled then B has no right to redouble and the expectation of A's winning equals 19/36x (—2)+17/36x (+2) = 1/9 « 0.111. On the other hand, if A has doubled then B receives the right to redouble and uses this right. Since q > 0.75, player A surrenders and the expectation of his winning is 19/36 x (—4) + 17/36 x (+4) = 2/9 « 0.222. In conclusion, player A has to double.

b) Let A has two checkers at positions 2 and 5 and B has one checker at position 6. This situation is described by the point B(19/36, 27/36) (see Tables 2 and 3). This point lies in the area between the curves PR(q) and PD (q), so A should not double. If A has not doubled then B has no right to redouble and the expectation of A's winning equals 19/36 x (+2) +17/36 x 9/36 x (+2) +17/36 x 27/36 x (—2) « 0.583. On the other hand, if A has doubled then B uses his right to redouble and the expectation of A's winning is 19/36 x (+4) + 17/36 x 9/36 x (+8) + 17/36 x 27/36 x (—8) « 0.222. Consequently, A should not double.

c) In this case we assume that A again has two checkers at positions 2 and 5 and B also has two checkers at positions 2 and 4. Using Table 3 we find that this situation is described by the point C(19/36, 23/36) which lies in the area between PR(q) and Pg(q), so A has to double and B has to accept. As in the previous two cases, if A has doubled his expectation of the winning equals 19/36 x (+2) + 17/36 x 13/36 x (+2) +17/36 x 23/36 x (—2) « 0.793 and equals 19/36 x (+4) + 17/36 x 13/36 x (+8) + 17/36 x 23/36 x (—8) « 1.062 in the opposite case. Hence, A has to double, in full correspondence with the optimal doubling rules stated above.

It is clear that situation b) is better then a) and c) is better than b) for player A. Nevertheless, he has to double in a), should not double in b) and again double in c)! This kind of a paradox may be clarified as follows. Doubling in the situation "my cube" means not only increasing a bet, but also giving your opponent a right to redouble, that he does not have in "his cube" case. Giving this right in fact means granting an extra weapon. This weapon is the most efficient if q = 0.75 and is useless if either q is below 0.5 (then B has no reason to use it) or q is close to 1 (then he redoubles for sure but it will give him almost nothing extra).

5. Single checker model

In (Ross et al., 2007) the so-called "single checker model" was introduced. Assume that players checkers do not interfere with each other, players A and B have pip-counts Sa and SB respectively, and the players alternately roll the dices and subtract points from their pip-counts. The player who comes to zero pip-count first wins. Such model would be adequate if each player had only one checker and these checkers moved along "big board" towards their houses. However, as was mentioned before, pip-count means not exact but minimal number of points that a player has to roll in order to remove all his checkers from the board, but it might be required more. For example, let us assume that a player has two checkers at positions 2 and 3 (so,

his pip-count equals 5) and he rolls 6 — 1. Although the number of points at his roll is larger than his pip-count, he can not remove two checkers. As the result, one checker remains and it will be removed on the next turn in any case. If the player has a lot of checkers at positions 1 and 2 then, in general, the total number of points rolled before all the checkers are removed will be much larger than player's pip-count. Another factor that leads to inefficient usage of the rolled points are "gaps in positions". For example if a player has 5 checkers at positions 5, 3 and 1 and, formally, his pip-count equals 45 there is a gap at 4 and if the player rolls 4 the only thing he can do is to move a checker from 5 to 1 (see Figure 2).

Fig. 2: Gaps in positions

Some heuristic rules on how to calculate "amendment" to pip-count in different positions are given in (Lamford and Gasquoine, 2002). Although these rules are not strictly justified they are still good enough for practical usage, as computer simulations show.

Let us return to the analysis of the single checker model. Let XA be a random variable with distribution P{XA = j} = nj, j = 3,..., 24, where nj are given in Table 1 and let (XA)ieN be a sequence of independent copies of XA Set T0A := 0, T^4 := XA + XA + ... + XA for n € N and define the first passage process

NA(m) := inf{n > 0 : TA > m}, m > 0. These quantities are interpreted as follows: — XA is the number of points rolled by player A on turn j;

t

TjA is the sum of points rolled by player A up to turn j;

tai

— NA(m) is the number of turns player A needs to roll at least m points.

In what follows XB, XB, TiB and NB (m) denote the same quantities for player B. Clearly, all random variables with superscript B are assume mutually independent of random variables with superscript A and XB = Xa.

With this notation at hands we can easily express different quantities related to the single checker model. For example, assume that player A has pip-count k, player B has pip-count s and player A moves first, then

F(k, s) := P{A wins} = P{NA(k) < NB (s)}.

and therefore

Ts/3] Ts/3]

F(k, s) = P{NB(s) = j}P{NA(k) < j} =: £ pj(s)Pj(k), (4)

j = 1 j = 1

where p(s) := P{NA(s) = j} and Pj(s) := P{NA(s) < j} = £j=0Pi(s)-

There is a simple recursive equation which can be used to calculate pj (s) efficiently. In order to write it in compact form, denote p(s) := (p0(s),pi(s),p2(s),...), then p(0) = (1,0, 0,...) and

24

p(s) = Sh^£ nj p((s - j)+)), s G N, (5)

j=3

where x+ := x V 0 and Sh : ^ is a shift operator:

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

Sh((xo, xi, x2,...)) = (0, xo, xi,...).

Of course, formulae (4) and (5) can be used to calculate F(k, s) explicitly but they are very inconvenient in real game, so some simple approximations are highly desirable. Some results in this direction are given next.

Computer simulations. In (Keeler and Spencer, 1975) the values of F(k, s) were tabulated using the Monte-Carlo method. More detailed tabulation of this function is given in (Zadeh and Kobilska, 1977) where calculations were performed using the dynamic programming. The authors obtain similar results using the recursive scheme (4)-(5), they are presented in Table 4 below.

Table 4: Values of F(k,s). Monte-Carlo simulations.

k s-k

0 5 10 15 20 25 30

20 0.70 0.81 0.91 0.96 0.98 0.99 0.99

40 0.65 0.76 0.85 0.92 0.95 0.97 0.98

60 0.62 0.72 0.80 0.87 0.92 0.95 0.97

80 0.60 0.68 0.77 0.86 0.91 0.94 0.95

100 0.58 0.67 0.76 0.83 0.87 0.91 0.94

120 0.57 0.67 0.76 0.82 0.86 0.89 0.93

Least-square estimation. Using the fact that F(k, s) is convex with respect to both parameters, the following formula was proposed in (Lamford and Gasquoine, 2002):

F (k,s) « 0.5+9 + k/100 + 4(s - k), V ' ' k + 7(s - k) + 25

where the constants were determined by the least square method. This formula appears to be precise, it gives only 2% absolute error for k < 40 and 1% absolute error for 100 < k < 500.

Normal approximation. The central limit theorem allows us to conclude that

lim P( < xl = #(x), x G R,

n^TO L o^Jn J

where « and o are given by (1). A well-known consequence of this classical result is the central limit theorem for the first passage process (NA(m))m>0 which is also

known as the renewal process (see p. 372 in (Feller, 1970)):

This result was used in (Ross et al., 2007) to deduce the following approximation4

where is the cumulative distribution function of the standard normal law. The calculations show that this asymptotic formula gives good approximations for F(k, s) for large k and s. However, it is useless in real game situation.

Even assuming that there is a player with phenomenal computational skills, who is able to calculate F(k, s) quickly, the question of doubling remains opened.

5.1. Advises for doubling

In (Buro, 1999) the concept of "equity"(E) was introduced. The "equity" is the dimensionless quantity which is equal to the ratio of the winning expectation to the current bet. From the definition, it is clear that E takes values from the interval [— 1,1], E = 1 if the player wins almost surely and E = —1 if he loses almost surely. Let us assume that each player acts rationally in the meaning that his decisions on whether to double or not, to accept a double or to surrender are made in order to maximize his winning expectation. As before, k denotes the pip-count of A and s denotes the pip-count of B. Define the following equities for the player A: E^y(k, s), Emy(k, s), Emy(k, s), E0ur(k, s), E1ur(k, s) and Eour(k, s). The subscript my/our indicates the current situation: "my cube" or "our cube". The superscript equals 1 if the player has doubled in the current situation and 0 if not. Finally, quantities without superscript are defined by the equations

Emy(k, s) := (k, s) V El„(k, s), EOMr (k, s) := Eo0„r (k, s) V E^^(k, s), (7)

that ensure the rational behavior. The initial conditions, which corresponds to the last turn of the game are straightforward:

for i e N, j > 0, k e N and l = 0,1. In order to calculate Emy(k, s), Emy(k, s), E0ur (k, s), E^ (k, s) for other values of k and s one may use the following recursive

4 Note that there is an additional summand 0.5^ in the numerator which appears due to what the authors of (Ross et al., 2007) call "continuity correction".

(6)

Ely (0j ) = Elur (0j) = 1,

Ely (i, 0) = E^ (i, 0) = -1,

Ely (1,k) = Ely (2, k) = Ely (3, k) = 1,

Elur (1, k) = Elur (2, k) = Ei„r (3, k) = 1.

(8) (9) (10) (11)

scheme:

24 24

EI>njEmy((k - i)+, (s - j)+), (12)

i=3 j=3 24

(- 2£niEmy(s, (k - i)+)) A 1, (13)

i=3

24

niEOMr(s, (k - i)+), (14)

i=3

El y (k,s). (15)

For example, equation (13) can be checked as follows. In this case A has doubled at the situation "my cube" with pip-counts (k, s). If B rejects the double, then A wins and Ely(k, s) = 1. Assume that B accepts the double. After A's roll the random number i (with distribution (nj)) is subtracted from A's pip-count, the bet is doubled and the turn and the right to double goes to B, so B will be in situation "my cube" with pip-counts (s, (k - i)+) and therefore his equity is 2Emy (s, (k -i)+). Taking into account that equities of both players have the same absolute value but opposite signs, A's equity equals -2Emy(s, (k-i)+). Since both players are rational, B will choose an action which minimizes A's equity and we arrive at (13).

Equation (15) is obvious because doubling in situations "my cube" and "our cube" have the same consequences. Formulas (12) and (14) can be proved by the same reasoning as have been used above.

As soon as all equities have been calculated, the matrices of doubling and acceptance can be constructed. Let Dbmy (k, s) (respectively, Dbour (k, s)) equals 1 if it is worth to double in "my cube" (respectively, "our cube") situation with pip-counts (k, s) and equals 0 otherwise, that is to say,

Dbmy (k s) = 1{Emy (fc,s)>Emy (fc,s)> , Dbo«r (k s) = 1{Elur (fc,s)>E0„r (fc,s)> .

Similarly, let Ac(k, s) be either 1 or 0 according to whether it is worth to accept a double or not if the pip-counts is (k, s), so

Ac(k s) = 1{Dbmy(fc,s)VDbour(fc,s)<1>.

It turns out, that each row in matrices Dbmy := (Dbmy(k, s))k,s>0 and Dbour := (Dbour(k, s))k,s>0 consists of a number of 1 followed by the infinite sequence of 0's; conversely, in matrix Ac := (Ac(k, s))k,s>0 a number of 0's are followed by the infinite sequence of 1's. These observations allow us to make a significant simplifications, let dbmy (respectively, dbour and ac) be the vector in which k-th coordinate is equal to the index of the last 1 (respectively, last 1 and last 0) in k-th row of Dbmy (respectively, Dbour and Ac). Set

= dbmy - (0, 1, 2, 3,...), = dbOMr - (0,1, 2, 3,...), = ac - (0,1, 2, 3,...).

Em y (k,s) = Em y (k,s) =

E0„r (k,s) = Ei„r (k,s) =

dbmy 11 *

*

ac

These vectors characterize the minimal advantage for doubling or maximal lagging for accepting the double. It turns out that components of these vectors form a non-decreasing sequence. This is summarized in Table 5, note that we do not include values k < 20 since in this case the number of rolls to take off the checkers strongly depends on the position.

Table 5: Advices on doubling and accepting the double.

Minimal advantage for Minimal advantage for Maximal lag to

doubling, "our cube" doubling, "my cube" accept the double

k s-k k s-k k s-k

22-27 0 23-27 1 23-26 3

28-32 1 28-33 2 27-31 4

33-38 2 34-38 3 32-37 5

39-45 3 39-44 4 38-44 6

46-53 4 45-51 5 45-51 7

54-62 5 52-60 6 52-59 8

63-72 6 61-69 7 60-68 9

73-82 7 70-79 8 69-78 10

83-93 8 80-89 9 79-88 11

94-105 9 90-100 10 89-98 12

106-117 10 101-111 11 99-110 13

118-129 11 112-124 12 111-122 14

130-143 12 125-136 13 123-134 15

144-157 13 137-150 14 135-148 16

158-170 14 151-163 15 149-161 17

References

Buro, M. (1999). Efficient approximation of backgammon race equities. International Computer Chess Association Journal, 22(3), 133-142.

Feller, W. (1970). An Introduction to Probability Theory and Its Applications. Vol 2. Second Edition. John Wiley & Sons, 669 p.

Keeler, E. and J. Spencer (1975). Optimal doubling in backgammon. Operations Research, 23(4), 1063-1071.

Kissane, J. (1992). Cluster counting. Chicago point, 52.

Lamford, P. and S. Gasquoine (2003). Improve your backgammon. Everyman: London, 128 p.

Morters, P. and Y. Peres (2010). Brownian Motion. Cambridge University Press, 357 p.

Ross, A., A. Benjamin and M. Munson (2007). Estimating winning probabilities in backgammon races. Optimal Play: Mathematical Studies of Games and Gambling, 269-291.

Tuck, E. (1980). Doubling strategies for backgammon-like games. J. Austral. Math. Soc., 21 (Series B), 440-451.

Zadeh, N. and G. Kobilska (1977). On optimal doubling in backgammon. Management Science, 23(8), 853-858.

i Надоели баннеры? Вы всегда можете отключить рекламу.