# Simudidactic

auto·di·dact n.
A self-taught person.
From Greek autodidaktosself-taught : auto-auto- + didaktostaught;

+

sim·u·late v.
To create a representation or model of (a physical system or particular situation, for example).
From Latin simulre, simult-, from similislike;

=
(If you can get past the mixing of Latin and Greek roots)

To learn by creating a representation or model of a physical system or particular situation. Particularly, using in silico computation to understand complex systems and phenomena.

———————————————————————

This concept has been floating around in my head for a little while. I’ve written before on how I believe that simulation can be used to improve one’s understanding of just about anything, but have never had a nice shorthand for this process.

Simudidactic inquiry is the process of understanding aspects of the world by abstracting them into a computational model, then conducting experiments in this model world by changing the underlying properties and parameters. In this way, one can ask questions like:

1. What type of observations might we make if x were true?
2. If my model of the process is accurate, can I recapture the underlying parameters given the type of observations I can make in the real world? How often will I be wrong?
3. Will I be able to distinguish between competing models given the observations I can make in the real world?

In addition to being able to ask these types of questions, the simudidact solidifies their understanding of the model by actually building it.

So go on, get simudidactic and learn via simulation!

# What is probabilistic truth? Part 2 – Everything is conditional

When making a statement of the form “1/2 is the correct probability that this coin will land tails”, there are a few things which are left unsaid, but which are typically implied.

The statement is one about the probability of an unknown event occurring, and it would seem reasonable to write this statement using probability notation as P(toss=tails) = 0.5. And indeed many people would express it this way. However, what is missing is the state of knowledge under which this statement has been made. For instance, is the coin yet to be flipped, or is it currently rolling in a circle on the table, leaning in toward its final resting position? Perhaps the flipping device can consistently throw a coin such that it rotates exactly 5 times in the air before landing flat on the table, or we know which side is up at the start of the flip. In these latter cases, the statement of probability would be made under considerably more knowledge than the first, and would not tend to be 0.5 in these cases. An observer placing a probability of P(toss=tails) = 0.99 at the moment when the coin is circling in on its resting position, leaning heavily toward a tails up configuration, could be said to have the correct probability also. For fairness, lets say that the first observer also makes her probability statement at the same moment, but from another room where she cannot see what has happened.

How can P(toss=tails) = 0.5, and P(toss=tails) = 0.99 be simultaneously correct?

The answer is conditioning. Each of the statements were made conditional on the observer’s state of knowledge. More completely, the two statements can be rewritten as:

P(toss=tails | knowledge of observer 1) = 0.5 , and

P(toss=tails | knowledge of observer 2) = 0.99

In practice, however, we often leave out the conditional part of the notation unless it is germane to the problem at hand. However, there is no such thing as unconditional probability. In fact, Harvard professor Joe Blitzstein calls conditioning the Soul of Statistics.

In the next post in this series, we’ll start looking at how to assess the correctness of a (conditional) probability statement after having observed an outcome.

Here’s a bunch of random walks — just ’cause its neat.

# CAISN

Canadian Aquatic Invasive Species Networks Annual General Meeting in Kananaskis, Alberta. May 03, 3:25-5:30.

This 2-hour workshop will focus on how and why we do numerical simulation in R. Time permitting, we will also look at how to build and fit likelihood based statistical models.

We ask that you bring your laptop with both R and R-Studio installed. If you've never worked with R before, please have a look at the…

I can think of worse places to get down and dirty with R than Kananaskis, Alberta.

# Mathematical abstraction and the robustness to assumptions

I’ve been showing my new favourite toys to just about anyone foolish enough to actually engage me in conversation. I described how my shiny new set of non-transitive dice work here, complete with a map showing all the relevant probabilities.

All was neat and tidy and wonderful until fellow ecologist, Aaron Ball, tried to burst my bubble.

Nope. I couldn’t find the error. Fortunately, he works across the hall so I just went and asked him.

The problem he found, it turns out, was not with my calculations but with my assumptions. Aaron told me that dice constructed with rounded corners and hollowed out pips for the numbers on the faces tend to be biased in the frequency at which each face rolls up. I had assumed, of course, that each side of each of the five dice would roll with the same probability (ie. 1 in 6).

As with any model of a real world system, the mathematics were carried out on a simplified abstraction of the system being modelled. There are always, by necessity, assumptions being made. The important thing is to make these assumptions as explicit as possible and, where possible, to test the robustness of the model predictions to violations of the assumptions. Implicit to my calculations of the odds of the non-transitive Grime dice was the assumption that the dice are fair.

To check the model for robustness to this assumption, we can relax it and find out if we still get the same behaviour. Specifically, we can ask here whether some sort of pip-and-rounded-corner-induced bias can lead to a change in the Grime dice non-transitive cycles.

It seems a natural place to look would be between the dice pairings which have the closest to even odds. We can find out what level of bias would be required to switch the directionality of the odds (or at least erase the tendency for one die to roll higher than the other). Lets try looking at Magenta and Red, which under the fair dice assumption have odds p(Magenta > Red)=5/9. What kind of bias will change this relationship? The odds can be evened out by either Magenta rolling ones more often, or red rolling nine more often. The question is then, how much bias would there need to in the dice in order to even out the odds between Magenta and Red?

Lets start with Red biasing toward rolling nine more often (recall that nine appears on only one face). Under the fair dice hypothesis, Red can roll nine (1/6 of the time) and win no matter what Magenta rolls, or by rolling four (5/6 of the time) and win when Magenta rolls one (1/3 of the time).

P(Red > Magenta) = 1/6 + 5/6 * 1/3, which is 4/9.

If we set this probability equal to 1/2, and replace the fraction of times that Red rolls nine with x, we can solve for the frequency needed to even the odds.

x + 5/6 * 1/3 = 1/2

x = 2/9

Meaning that the Red die would have to be biased toward rolling nine with 2/9 odds. That’s equivalent to rolling a nine 1 and 1/3 times (33%) more often than you would expect if the die were fair!

Alternatively, the other way the odds between Red and Magenta could be evened is if Magenta biased towards rolling ones more often. We can do the same kind of calculation as above to figure out how much bias would be needed.

1/6 + 5/6 * x = 1/2

x = 2/5

Which corresponds to Magenta having  a 20% bias toward rolling ones. Of course, some combination of these biases could also be possible.

I leave it to the reader to work out the other pairings, but from the Red-Magenta analysis we can see that even if the dice deviated quite a bit from the expected 1/6 probability for each side, the edge afforded to Magenta is retained. I couldn’t find any convincing  evidence for the extent of bias caused by pipping and rounded corners but it seems unlikely that it would be strong enough to change the structure of the game.

# Introduction to Simulation using R

We had a great turnout yesterday for our Zero to R Hero workshop at the Quebec Centre for Biodiversity Science. We went from the absolute basics of the command line, to the intricacies of importing data, and finally we had a look at plotting using ggplot2. We didn’t have time to get to this extra module introducing simulation, but if you want to work through it on your own you can find the slides here:

The script file to follow along with is here:

https://gist.github.com/cjbayesian/5220711

# Who is uncomfortable with uncertainty?

In The Economists’ World in 2013 edition, I came across a very interesting statement about forecast uncertainty, and just who are the ones doing all the squirming about it.

…recent reforms to the IPCC’s procedures will do little to change its tendency to focus on the areas where there is greater consensus, avoiding the uncertainties which, though unpalatable for scientists, are important to policy. (link)

What struck me about this claim is that it runs completely counter to what I’ve been told during my training as a scientist. It is the scientist, it goes, that possesses a deep understanding of uncertainty. The policy maker, on the other hand, is an oaf who wishes to hear only of black and white pronouncements about the effect of x on y. Could it be that this perception is inverted in each camp?

Certainly the scientist and policy maker each wish to decrease uncertainty. However, it ought to be that neither finds it ‘unpalatable’ in and of itself, but rather an inextricable part of our predictions about complex systems (or even the simplest ones, for that matter). Acknowledgement, understanding, and quantification of uncertainty are absolutely crucial to conducting good science as well as informing science directed policy.

# Introduction to Bayesian lecture: Accompanying handouts and demos

I recently posted the slides from a guest lecture that I gave on Bayesian methods for biologists/ecologist. In an effort to promote active learning, the class was not a straight forward lecture, but rather a combination of informational input from me and opportunities for students to engage with the concepts via activities and discussion of demonstrations. These active components were designed with the goal of promoting students’ construction of knowledge, as opposed to a passive transfer from teacher to learner.

In order to bring the online reader into closer allignment with the experience of attending the class, I have decided to provide the additional materials that I used to promote active learning.

1) Monte-Carlo activity:

In pairs, students are provided with a random number sheet and a circle plot handout:

One student is the random number generator, the other is the plotter. After students plot a few points, we collect all the data and walk through a discussion of why this works. We then scale up and take a look at the same experiment using a computer simulation to see how our estimate converges toward the correct value.

2) Metropolis-Hastings in action:

In this demonstration, we walk through the steps of the MH algorithm visually.

Discussion is then facilitated regarding the choice of proposal distribution, autocorrelation, and convergence diagnosis around this demonstration.

I hope that you find this helpful. If you are teaching this topic in your class, feel free to borrow, and improve upon, these materials. If you do, drop me a note and let me know how it went!

# Guest lecture at Dawson College

On Friday, I gave a guest lecture on statistics to a group of biology students at Dawson College in Montreal. We had some fun testing whether name brand cookies contained more chocolate chips, on average, than generic brand cookies.

# Dynamical systems: Mapping chaos with R

Chaos. Hectic, seemingly unpredictable, complex dynamics. In a word: fun. I usually stick to the warm and fuzzy world of stochasticity and probability distributions, but this post will be (almost) entirely devoid of randomness. While chaotic dynamics are entirely deterministic, their sensitivity to initial conditions can trick the observer into seeing iid.

In ecology, chaotic dynamics can emerge from a very simple model of population.

`$x_{t+1} = r x_t(1-x_t)$`

Where the population in time-step t+1 is dependent on the population at time step t, and some intrinsic rate of growth, r. This is known as the logistic (or quadratic) map. For any starting value of x at t0, the entire evolution of the system can be computed exactly. However, there some values of r for which the system will diverge substantially with even a very slight change in the initial position.

We can see the behaviour of this model by simply plotting the time series of population sizes. Another, and particularly instructive way of visualizing the dynamics, is through the use of a cobweb plot. In this representation, we can see how the population x at time t maps to population x at time t+1 by reflecting through the 1:1 line. Each representation is plotted here:

You can plot realizations of the system using the following R script.

```
q_map<-function(r=1,x_o=runif(1,0,1),N=100,burn_in=0,...)
{
par(mfrow=c(2,1),mar=c(4,4,1,2),lwd=2)
############# Trace #############
x<-array(dim=N)
x[1]<-x_o
for(i in 2:N)
x[i]<-r*x[i-1]*(1-x[i-1])

plot(x[(burn_in+1):N],type='l',xlab='t',ylab='x',...)
#################################

x<-seq(from=0,to=1,length.out=100)
x_np1<-array(dim=100)
for(i in 1:length(x))
x_np1[i]<-r*x[i]*(1-x[i])

plot(x,x_np1,type='l',xlab=expression(x[t]),ylab=expression(x[t+1]))
abline(0,1)

start=x_o
vert=FALSE
lines(x=c(start,start),y=c(0,r*start*(1-start)) )
for(i in 1:(2*N))
{
if(vert)
{
lines(x=c(start,start),y=c(start,r*start*(1-start)) )
vert=FALSE
}
else
{
lines(x=c(start,
r*start*(1-start)),
y=c(r*start*(1-start),
r*start*(1-start)) )
vert=TRUE
start=r*start*(1-start)
}
}
#################################
}

```

To use, simply call the function with any value of r, and a starting position between 0 an 1.

```
q_map(r=3.84,x_o=0.4)

```

Fun right?

Now that you’ve tried a few different values of r at a few starting positions, it’s time to look a little closer at what ranges of r values produce chaotic behaviour, which result in stable orbits, and which lead to dampening oscillations toward fixed points. There is a rigorous mathematics behind this kind of analysis of dynamic systems, but we’re just going to do some numerical experimentation using trusty R and a bit of cpu time.

To do this, we’ll need to iterate across a range of r values, and at each one start a dynamical system with a random starting point (told you there would be some randomness in this post). After some large number of time-steps, we’ll record where the system ended up. Plotting the results, we can see a series of period doubling (2,4,8, etc) bifurcations interspersed with regions of chaotic behaviour.

```library(parallel)
bifurcation<-function(from=3,to=4,res=500,
x_o=runif(1,0,1),N=500,reps=500,cores=4)
{
r_s<-seq(from=from,to=to,length.out=res)
r<-numeric(res*reps)
for(i in 1:res)
r[((i-1)*reps+1):(i*reps)]<-r_s[i]

x<-array(dim=N)

iterate<-mclapply(1:(res*reps),
mc.cores=cores,
function(k){
x[1]<-runif(1,0,1)
for(i in 2:N)
x[i]<-r[k]*x[i-1]*(1-x[i-1])

return(x[N])
})

plot(r,iterate,pch=15,cex=0.1)

return(cbind(r,iterate))
}

#warning: Even in parallel with 4 cores, this is by no means fast code!
bi<-bifurcation()
png('chaos.png',width=1000,height=850)
par(bg='black',col='green',col.main='green',cex=1)
plot(bi,col='green',xlab='R',ylab='n --> inf',main='',pch=15,cex=0.2)
dev.off()

```

This plot is known as a bifurcation diagram and is likely a familiar sight.

Hopefully working through the R code and running it yourself will help you interpret cobweb plots, as well as bifurcation diagrams. It is really quite amazing how the simple looking logistic map equation can lead to such interesting behaviour.