## Cross validated question

**A**nother problem generated by X’validated* (on which I spent much too much time!)*: given an unbiased coin that produced *M* heads in the first *M* tosses, what is the expected number of additional tosses needed to get *N* *(N>M)* consecutive heads?

**C**onsider the preliminary question of getting a sequence of *N* heads out of *k* tosses, with probability *1-p(N,k)*. The complementary probability is given by the recurrence formula

Indeed, my reasoning is that the event of no consecutive *N* heads out of *k* tosses can be decomposed according to the first occurrence of a tail out of the first *N* tosses. Conditioning on whether this first tail occurs at the first, second, …, *n*th draw leads to this recurrence relation. As I wanted to make sure, I rand the following R code

#no sequence of length N out of k draws pnk=function(N,k){ if (k<N){ p=1} else{ p=0 for (j in 1:N) p=p+pnk(N,k-j)/2^j } return(p) }

and got the following check:

> k=15 > #N=2 > 1-pnk(2,k)-sum(apply(vale[,-1]*vale[,-k],1,max))/10^6 [1] 6.442773e-05 > #N=3 > 1-pnk(3,k)-sum(apply(vale[,-(1:2)]*vale[,-c(1,k)]*vale[,-((k-1):k)],1,max))/10^6 [1] 0.0004090137

**N**ext, the probability of getting the first consecutive *N heads* in *m≥ N* tosses is

Both first cases are self-explanatory. the third case corresponds to a tail occurring at the *m−N−1*th draw, followed by* N* heads, and prohibiting *N* consecutive heads prior to the *m−N−1*th toss. When checking by

Tsim=10^7 S=sample(c(0,1),Tsim,rep=TRUE) SS=S[-Tsim]*S[-1] out=NULL i=2 while (i<=length(SS)){ if ((SS[i]==1)&&(SS[i-1]==1)){ out=c(out,i);i=i+1} i=i+1} dif=diff((1:length(SS[-out]))[SS[-out]==1]) trobs=probs=tabulate(dif+(dif==1))/length(dif)[1:20] for (t in 1:20) trobs[t]=qmn(2,t) barplot(probs,col="orange2",ylim=c(-max(probs),max(probs))) barplot(-trobs[1:20],col="wheat",add=TRUE)

I however get a discrepancy shown in the above graph for the cases *m=3,4*, and *N=2*, which is due to the pseudo-clever way I compute the waiting times, removing the extra 1’s… Because the probabilities to wait 3 and 4 times for 2 heads should really be both equal to 1/2³. And things agree after that.

**N**ow, the probability to get *M* heads first and *N* heads in *m≥ N* tosses (and no less) is

The third case is explained by the fact that completions of the first sequence of heads must stop (by a tail) before reaching *N* heads. Hence the conditional probability of waiting *m* tosses to get *N* consecutive heads given the first *M* consecutive heads is

**T**he expected number can then be derived by

or

for the number of *additional* steps…

**C**hecking for the smallest values of *M* and *N*, I got a reasonable agreement with the theoretical value of 2^{N+1}-2^{M+1}(established on Cross validated). (For larger values of *M* and *N*, I had to replace the recursive definition of pnk with a matrix computed once for all.)

October 24, 2016 at 7:27 am

Old thread, but I found myself reading (and actually noticing the xi’an username attached to) one of your cross-validated answers today. http://stats.stackexchange.com/questions/22879/linear-combination-of-two-dependent-multivariate-normal-random-variables

It was early in the morning.

October 25, 2016 at 7:49 am

You should not stay awake that late on my behalf, you know..!

February 23, 2012 at 5:53 am

[…] Xi’an’s Og studies a CrossValidated.SE question. […]

February 21, 2012 at 12:20 am

Is it possible with today’s manufacturing processes to produce a unbiased coin?

February 23, 2012 at 3:35 pm

Dear John, unfortunately your question is the wrong way round. Andrew Gelman and Deborah Nolan make the strong point that there actually is no such thing as a biased coin: http://www.stat.columbia.edu/~gelman/research/published/diceRev2.pdf