What is probabilistic truth?

I am currently working on a validation metric for binary prediction models. That is, models which make predictions about outcomes that can take on either of two possible states (eg Dead/not dead, heads/tails, cat in picture/no cat in picture, etc.) The most commonly used metric for this class of models is AUC, which assesses the relative error rates (false positive, false negative) across the whole range of possible decision thresholds. The result is a curve that looks something like this:


Where the area under the curve (the curve itself is the Receiver Operator Curve (ROC)) is some value between 0 and 1. The higher this value, the better your model is said to perform. The problem with this metric, as many authors have pointed out, is that a model can perform very well in terms of AUC, but be completely miscalibrated in terms of the actual probabilities placed on each outcome.

A model which distinguishes perfectly between positive and negative cases (AUC=1) by placing a probability of 0.01 on positive cases and 0.001 on negative cases may be very far off in terms of the actual probability of a positive case. For instance, positive cases may actually occur with probability 0.6 and negative cases with 0.2. In most real situations, our models will predict a whole range of different probabilities with a unique prediction for each data point, but the general idea remains. If your goal is simply to distinguish between cases, you may not care whether the probabilities are not correct. However, if your model is purporting to quantify risk then you very much want to know if you are placing the probabilistically true predictions on cases that are yet to be observed.

Which begs the question: What is probabilistic truth? 

This questions appears, at least at first, to be rather simple. A frequentist definition would say that the probability is correct, or true, if the predicted probability is equal to the long run outcomes.  Think of a dice rolled over and over counting the number of times a one is rolled. We would compare this frequency to our predicted probability of rolling a one (1/6 for a fair six-sided die) and would say that our predicted probability was true if this frequency matched 1/6.

But what about situations where we can’t re-run an experiment over and over again? How then would we evaluate the probabilistic truth of our predictions?

I’ll be working through this problem in a series of posts in the coming weeks. Stay tuned!

Read Part 2

Simulation and Likelihood Methods Workshop in Kananaskis

Corey Chivers:

I can think of worse places to get down and dirty with R than Kananaskis, Alberta.

Originally posted on Zero to R Hero:


Canadian Aquatic Invasive Species Networks Annual General Meeting in Kananaskis, Alberta. May 03, 3:25-5:30.

This 2-hour workshop will focus on how and why we do numerical simulation in R. Time permitting, we will also look at how to build and fit likelihood based statistical models.

We ask that you bring your laptop with both R and R-Studio installed. If you’ve never worked with R before, please have a look at the getting started with R document. You can
also check out the slides from our more introductory workshops.


Section 1: Introduction to Simulation (script)

  •     What is (numerical) simulation?
  •     Drawing random samples from a set
  •     Drawing random samples from a probability distribution
  •     Describing models in terms of their deterministic and stochastic parts
  •     Simulating data from a model

Section 2: Likelihood Methods(script)

  •     The Likelihood Principle
  •     The Ecologist’s Quarter
  •     Maximum…

View original 30 more words

Mathematical abstraction and the robustness to assumptions

I’ve been showing my new favourite toys to just about anyone foolish enough to actually engage me in conversation. I described how my shiny new set of non-transitive dice work here, complete with a map showing all the relevant probabilities.

All was neat and tidy and wonderful until fellow ecologist, Aaron Ball, tried to burst my bubble.

Nope. I couldn’t find the error. Fortunately, he works across the hall so I just went and asked him.

The problem he found, it turns out, was not with my calculations but with my assumptions. Aaron told me that dice constructed with rounded corners and hollowed out pips for the numbers on the faces tend to be biased in the frequency at which each face rolls up. I had assumed, of course, that each side of each of the five dice would roll with the same probability (ie. 1 in 6).

As with any model of a real world system, the mathematics were carried out on a simplified abstraction of the system being modelled. There are always, by necessity, assumptions being made. The important thing is to make these assumptions as explicit as possible and, where possible, to test the robustness of the model predictions to violations of the assumptions. Implicit to my calculations of the odds of the non-transitive Grime dice was the assumption that the dice are fair.

To check the model for robustness to this assumption, we can relax it and find out if we still get the same behaviour. Specifically, we can ask here whether some sort of pip-and-rounded-corner-induced bias can lead to a change in the Grime dice non-transitive cycles.

It seems a natural place to look would be between the dice pairings which have the closest to even odds. We can find out what level of bias would be required to switch the directionality of the odds (or at least erase the tendency for one die to roll higher than the other). Lets try looking at Magenta and Red, which under the fair dice assumption have odds p(Magenta > Red)=5/9. What kind of bias will change this relationship? The odds can be evened out by either Magenta rolling ones more often, or red rolling nine more often. The question is then, how much bias would there need to in the dice in order to even out the odds between Magenta and Red?

Lets start with Red biasing toward rolling nine more often (recall that nine appears on only one face). Under the fair dice hypothesis, Red can roll nine (1/6 of the time) and win no matter what Magenta rolls, or by rolling four (5/6 of the time) and win when Magenta rolls one (1/3 of the time).

P(Red > Magenta) = 1/6 + 5/6 * 1/3, which is 4/9.

If we set this probability equal to 1/2, and replace the fraction of times that Red rolls nine with x, we can solve for the frequency needed to even the odds.

x + 5/6 * 1/3 = 1/2

x = 2/9

Meaning that the Red die would have to be biased toward rolling nine with 2/9 odds. That’s equivalent to rolling a nine 1 and 1/3 times (33%) more often than you would expect if the die were fair!

Alternatively, the other way the odds between Red and Magenta could be evened is if Magenta biased towards rolling ones more often. We can do the same kind of calculation as above to figure out how much bias would be needed.

1/6 + 5/6 * x = 1/2

x = 2/5

Which corresponds to Magenta having  a 20% bias toward rolling ones. Of course, some combination of these biases could also be possible.

I leave it to the reader to work out the other pairings, but from the Red-Magenta analysis we can see that even if the dice deviated quite a bit from the expected 1/6 probability for each side, the edge afforded to Magenta is retained. I couldn’t find any convincing  evidence for the extent of bias caused by pipping and rounded corners but it seems unlikely that it would be strong enough to change the structure of the game.

A quick guide to non-transitive Grime Dice

A very special package that I am rather excited about arrived in the mail recently. The package contained a set of 6-sided dice. These dice, however, don’t have the standard numbers one to six on their faces. Instead, they have assorted numbers between zero and nine. Here’s the exact configuration:


Aside from maybe making for a more interesting version of snakes and ladders, why the heck am I so excited about these wacky dice? To find out what makes them so interesting, lets start by just rolling one against another and seeing which one rolls the higher number. Simple enough. Lets roll Red against Blue. Until you get your own set, you can roll in silico.

That was fun. We can do it over and over again and we’ll find that Red beats Blue more often than not. So it seems like Red is a pretty good bet. Now lets try rolling Olive against Red. I’ll wait.

Hey, look at that, the mighty Red has fallen. Olive tends to roll a higher number than Red more often than it doesn’t. So far, we have discovered this relationship:

Olive > Red > Blue

All hail the dominant Olive! Out of these three dice, if we want the best chance of winning, we should always pick Olive right? No dice, as they say. When we roll Olive against Blue, we find that Blue wins more often!

For any one of these three dice, there is another that will roll a higher number more often than not.

Olive > Red > Blue > Olive > Red > Blue > Olive > Red > Blue..

This forms a chain of dominance relationships that is a closed cycle. This property is called intransivity, and you can use it to win riches beyond your wildest dreams, er, well, at least to impress your friends.

Neat, right? But there’s more! We can do the same trick with Yellow, Magenta, and Red (Red > Magenta > Yellow > Red > …). With all five dice, there is a chain for which the order is given by that length of the word for each colour.

Red > Blue > Olive > Yellow > Magenta > …

Awesome. But that’s not it, either! You may have noticed from our three way comparisons that there is another five way chain. This time, the chain order is given by the alphabetical order of the words for each of the colours.

Blue > Magenta > Olive > Red > Yellow > …

What are the odds?

So far I’ve just asked you to take my word for it that the dominance relationships are as I described. Working out the odds of winning for any given pairing of dice as actually quite straightforward. Start by looking at the number on each side of the first die, one at a time. Count how many sides on the opposing die are less than the current number and divide by six. Since each side on the first die has a 1/6 chance of appearing, divide by 6 again. Sum these values for all six sides and you will have the probability that the first die will roll a higher number than the second.

For example, P(Red > Blue) = 5/6 x 1/2 + 1/6, which is 7/12.

Here I’ve worked out all of the pairwise odds:


So, you can always win in this game as long as you get to be second to choose a colour. The odds are strongest in your favour when your opponent either chooses Magenta or Red, and you choose Olive or Yellow, respectively. Isn’t probability wonderful!

And if you still want more, it turns out that if you roll the Grime dice in pairs, the order of the word length chain reverses!

Open Data Exchange 2013, April 6. Montreal

UPDATE: The day was great! There are many people doing really amazing things with open data and it was amazing to meet them. Here are my slides from the panel talk.
Next Saturday, I’ll be sitting on a panel discussing future avenues for open data at ODX13.
From the odx13 site:

Odx13 is a mini-conference to discuss the successes and challenges of extracting value from Open Data for civic engagement, international aid transparency, scientific research, and more!


Morning Session – Open Data Stories; Panel Discussions

9:00 AM    Introduction and Welcome

9:15 AM    Winning with Open Data – Panel 1

10:10 AM    Les données ouvertes en action – Panel 2 (en français)

  • Guillaume Ducharme, gestionnaire dans le réseau de la santé et membre du collectif Démocratie Ouverte
  • Sébastien Pierre, fondateur, FFunction & Montréal Ouvert
  • Josée Plamondon, co-conceptrice, ContratsNet
  • Jean-Noé Landry (l’animateur de discussion), fondateur, Montréal Ouvert et Québec Ouvert

11:05 AM    Future Avenues for Open Data – Panel 3

12:00 PM    Lunch will be provided

Afternoon Session – Digging into Data; Workshop and Lightning Talks

1:00 PM    Data Dive Intro – Exploratory Data Analysis with Trudat

1:30 PM    Data Dive

We will dive into interesting Open Data sets with experts on hand to guide us through the weeds, including data on

  • International Aid
  • Government contracts
  • Biodiversity
  • and more…

3:00 PM   Lightning Talks

4:00 PM    Present data insights

4:45 PM    Closing remarks

Introduction to Simulation using R

We had a great turnout yesterday for our Zero to R Hero workshop at the Quebec Centre for Biodiversity Science. We went from the absolute basics of the command line, to the intricacies of importing data, and finally we had a look at plotting using ggplot2. We didn’t have time to get to this extra module introducing simulation, but if you want to work through it on your own you can find the slides here:


The script file to follow along with is here:


Corey Chivers:

As a follow up to my simulation based approximate solution to the Gambling Machine Puzzle, here is the exact solution from mathematician Michael Lugo with a nice explaination.

Originally posted on God plays dice:

From the New York Times “Numberplay” blog:

An entrepreneur has devised a gambling machine that chooses two independent random variables x and y that are uniformly and independently distributed between 0 and 100. He plans to tell any customer the value of x and to ask him whether y > x or x > y.

If the customer guesses correctly, he is given y dollars. If x = y, he’s given y/2 dollars. And if he’s wrong about which is larger, he’s given nothing.

The entrepreneur plans to charge his customers $40 for the privilege of playing the game. Would you play?

Clearly the strategy is to guess that y > x if x is small, and to guess that y < x if x is large. Say you’re told x = 60. If you guess x is the larger variable, then conditional on your guess being correct (which…

View original 459 more words