# Mathematical abstraction and the robustness to assumptions

I’ve been showing my new favourite toys to just about anyone foolish enough to actually engage me in conversation. I described how my shiny new set of non-transitive dice work here, complete with a map showing all the relevant probabilities.

All was neat and tidy and wonderful until fellow ecologist, Aaron Ball, tried to burst my bubble.

Nope. I couldn’t find the error. Fortunately, he works across the hall so I just went and asked him.

The problem he found, it turns out, was not with my calculations but with my assumptions. Aaron told me that dice constructed with rounded corners and hollowed out pips for the numbers on the faces tend to be biased in the frequency at which each face rolls up. I had assumed, of course, that each side of each of the five dice would roll with the same probability (ie. 1 in 6).

As with any model of a real world system, the mathematics were carried out on a simplified abstraction of the system being modelled. There are always, by necessity, assumptions being made. The important thing is to make these assumptions as explicit as possible and, where possible, to test the robustness of the model predictions to violations of the assumptions. Implicit to my calculations of the odds of the non-transitive Grime dice was the assumption that the dice are fair.

To check the model for robustness to this assumption, we can relax it and find out if we still get the same behaviour. Specifically, we can ask here whether some sort of pip-and-rounded-corner-induced bias can lead to a change in the Grime dice non-transitive cycles.

It seems a natural place to look would be between the dice pairings which have the closest to even odds. We can find out what level of bias would be required to switch the directionality of the odds (or at least erase the tendency for one die to roll higher than the other). Lets try looking at Magenta and Red, which under the fair dice assumption have odds p(Magenta > Red)=5/9. What kind of bias will change this relationship? The odds can be evened out by either Magenta rolling ones more often, or red rolling nine more often. The question is then, how much bias would there need to in the dice in order to even out the odds between Magenta and Red?

Lets start with Red biasing toward rolling nine more often (recall that nine appears on only one face). Under the fair dice hypothesis, Red can roll nine (1/6 of the time) and win no matter what Magenta rolls, or by rolling four (5/6 of the time) and win when Magenta rolls one (1/3 of the time).

P(Red > Magenta) = 1/6 + 5/6 * 1/3, which is 4/9.

If we set this probability equal to 1/2, and replace the fraction of times that Red rolls nine with x, we can solve for the frequency needed to even the odds.

x + 5/6 * 1/3 = 1/2

x = 2/9

Meaning that the Red die would have to be biased toward rolling nine with 2/9 odds. That’s equivalent to rolling a nine 1 and 1/3 times (33%) more often than you would expect if the die were fair!

Alternatively, the other way the odds between Red and Magenta could be evened is if Magenta biased towards rolling ones more often. We can do the same kind of calculation as above to figure out how much bias would be needed.

1/6 + 5/6 * x = 1/2

x = 2/5

Which corresponds to Magenta having  a 20% bias toward rolling ones. Of course, some combination of these biases could also be possible.

I leave it to the reader to work out the other pairings, but from the Red-Magenta analysis we can see that even if the dice deviated quite a bit from the expected 1/6 probability for each side, the edge afforded to Magenta is retained. I couldn’t find any convincing  evidence for the extent of bias caused by pipping and rounded corners but it seems unlikely that it would be strong enough to change the structure of the game.

# Introduction to Simulation using R

We had a great turnout yesterday for our Zero to R Hero workshop at the Quebec Centre for Biodiversity Science. We went from the absolute basics of the command line, to the intricacies of importing data, and finally we had a look at plotting using ggplot2. We didn’t have time to get to this extra module introducing simulation, but if you want to work through it on your own you can find the slides here:

The script file to follow along with is here:

https://gist.github.com/cjbayesian/5220711

# Who is uncomfortable with uncertainty?

In The Economists’ World in 2013 edition, I came across a very interesting statement about forecast uncertainty, and just who are the ones doing all the squirming about it.

…recent reforms to the IPCC’s procedures will do little to change its tendency to focus on the areas where there is greater consensus, avoiding the uncertainties which, though unpalatable for scientists, are important to policy. (link)

What struck me about this claim is that it runs completely counter to what I’ve been told during my training as a scientist. It is the scientist, it goes, that possesses a deep understanding of uncertainty. The policy maker, on the other hand, is an oaf who wishes to hear only of black and white pronouncements about the effect of x on y. Could it be that this perception is inverted in each camp?

Certainly the scientist and policy maker each wish to decrease uncertainty. However, it ought to be that neither finds it ‘unpalatable’ in and of itself, but rather an inextricable part of our predictions about complex systems (or even the simplest ones, for that matter). Acknowledgement, understanding, and quantification of uncertainty are absolutely crucial to conducting good science as well as informing science directed policy.

# Introduction to Bayesian lecture: Accompanying handouts and demos

I recently posted the slides from a guest lecture that I gave on Bayesian methods for biologists/ecologist. In an effort to promote active learning, the class was not a straight forward lecture, but rather a combination of informational input from me and opportunities for students to engage with the concepts via activities and discussion of demonstrations. These active components were designed with the goal of promoting students’ construction of knowledge, as opposed to a passive transfer from teacher to learner.

In order to bring the online reader into closer allignment with the experience of attending the class, I have decided to provide the additional materials that I used to promote active learning.

1) Monte-Carlo activity:

In pairs, students are provided with a random number sheet and a circle plot handout:

One student is the random number generator, the other is the plotter. After students plot a few points, we collect all the data and walk through a discussion of why this works. We then scale up and take a look at the same experiment using a computer simulation to see how our estimate converges toward the correct value.

2) Metropolis-Hastings in action:

In this demonstration, we walk through the steps of the MH algorithm visually.

Discussion is then facilitated regarding the choice of proposal distribution, autocorrelation, and convergence diagnosis around this demonstration.

I hope that you find this helpful. If you are teaching this topic in your class, feel free to borrow, and improve upon, these materials. If you do, drop me a note and let me know how it went!

# Guest lecture at Dawson College

On Friday, I gave a guest lecture on statistics to a group of biology students at Dawson College in Montreal. We had some fun testing whether name brand cookies contained more chocolate chips, on average, than generic brand cookies.

# Dynamical systems: Mapping chaos with R

Chaos. Hectic, seemingly unpredictable, complex dynamics. In a word: fun. I usually stick to the warm and fuzzy world of stochasticity and probability distributions, but this post will be (almost) entirely devoid of randomness. While chaotic dynamics are entirely deterministic, their sensitivity to initial conditions can trick the observer into seeing iid.

In ecology, chaotic dynamics can emerge from a very simple model of population.

`$x_{t+1} = r x_t(1-x_t)$`

Where the population in time-step t+1 is dependent on the population at time step t, and some intrinsic rate of growth, r. This is known as the logistic (or quadratic) map. For any starting value of x at t0, the entire evolution of the system can be computed exactly. However, there some values of r for which the system will diverge substantially with even a very slight change in the initial position.

We can see the behaviour of this model by simply plotting the time series of population sizes. Another, and particularly instructive way of visualizing the dynamics, is through the use of a cobweb plot. In this representation, we can see how the population x at time t maps to population x at time t+1 by reflecting through the 1:1 line. Each representation is plotted here:

You can plot realizations of the system using the following R script.

```
q_map<-function(r=1,x_o=runif(1,0,1),N=100,burn_in=0,...)
{
par(mfrow=c(2,1),mar=c(4,4,1,2),lwd=2)
############# Trace #############
x<-array(dim=N)
x[1]<-x_o
for(i in 2:N)
x[i]<-r*x[i-1]*(1-x[i-1])

plot(x[(burn_in+1):N],type='l',xlab='t',ylab='x',...)
#################################

x<-seq(from=0,to=1,length.out=100)
x_np1<-array(dim=100)
for(i in 1:length(x))
x_np1[i]<-r*x[i]*(1-x[i])

plot(x,x_np1,type='l',xlab=expression(x[t]),ylab=expression(x[t+1]))
abline(0,1)

start=x_o
vert=FALSE
lines(x=c(start,start),y=c(0,r*start*(1-start)) )
for(i in 1:(2*N))
{
if(vert)
{
lines(x=c(start,start),y=c(start,r*start*(1-start)) )
vert=FALSE
}
else
{
lines(x=c(start,
r*start*(1-start)),
y=c(r*start*(1-start),
r*start*(1-start)) )
vert=TRUE
start=r*start*(1-start)
}
}
#################################
}

```

To use, simply call the function with any value of r, and a starting position between 0 an 1.

```
q_map(r=3.84,x_o=0.4)

```

Fun right?

Now that you’ve tried a few different values of r at a few starting positions, it’s time to look a little closer at what ranges of r values produce chaotic behaviour, which result in stable orbits, and which lead to dampening oscillations toward fixed points. There is a rigorous mathematics behind this kind of analysis of dynamic systems, but we’re just going to do some numerical experimentation using trusty R and a bit of cpu time.

To do this, we’ll need to iterate across a range of r values, and at each one start a dynamical system with a random starting point (told you there would be some randomness in this post). After some large number of time-steps, we’ll record where the system ended up. Plotting the results, we can see a series of period doubling (2,4,8, etc) bifurcations interspersed with regions of chaotic behaviour.

```library(parallel)
bifurcation<-function(from=3,to=4,res=500,
x_o=runif(1,0,1),N=500,reps=500,cores=4)
{
r_s<-seq(from=from,to=to,length.out=res)
r<-numeric(res*reps)
for(i in 1:res)
r[((i-1)*reps+1):(i*reps)]<-r_s[i]

x<-array(dim=N)

iterate<-mclapply(1:(res*reps),
mc.cores=cores,
function(k){
x[1]<-runif(1,0,1)
for(i in 2:N)
x[i]<-r[k]*x[i-1]*(1-x[i-1])

return(x[N])
})

plot(r,iterate,pch=15,cex=0.1)

return(cbind(r,iterate))
}

#warning: Even in parallel with 4 cores, this is by no means fast code!
bi<-bifurcation()
png('chaos.png',width=1000,height=850)
par(bg='black',col='green',col.main='green',cex=1)
plot(bi,col='green',xlab='R',ylab='n --> inf',main='',pch=15,cex=0.2)
dev.off()

```

This plot is known as a bifurcation diagram and is likely a familiar sight.

Hopefully working through the R code and running it yourself will help you interpret cobweb plots, as well as bifurcation diagrams. It is really quite amazing how the simple looking logistic map equation can lead to such interesting behaviour.