Visualizing Bayesian Updating

One of the most straightforward examples of how we use Bayes to update our beliefs as we acquire more information can be seen with a simple Bernoulli process. That is, a process which has only two  possible outcomes.

Probably the most commonly thought of example is that of a coin toss. The outcome of tossing a coin can only be either heads, or tails (barring the case that the coin lands perfectly on edge), but there are many other real world examples of Bernoulli processes. In manufacturing, a widget may come off of the production line either working, or faulty.  We may wish to know the probability that a given widget will be faulty.  We can solve this using Bayesian updating.

I’ve put together this little piece of R code to help visualize how our beliefs about the probability of success (heads, functioning widget, etc) are updated as we observe more and more outcomes.


## Simulate Bayesian Binomial updating

sim_bayes<-function(p=0.5,N=10,y_lim=15)
{
  success<-0
  curve(dbeta(x,1,1),xlim=c(0,1),ylim=c(0,y_lim),xlab='p',ylab='Posterior Density',lty=2)
  legend('topright',legend=c('Prior','Updated Posteriors','Final Posterior'),lty=c(2,1,1),col=c('black','black','red'))
  for(i in 1:N)
  {
    if(runif(1,0,1)<=p)
        success<-success+1

    curve(dbeta(x,success+1,(i-success)+1),add=TRUE)
    print(paste(success,"successes and ",i-success," failures"))
  }
  curve(dbeta(x,success+1,(i-success)+1),add=TRUE,col='red',lwd=1.5)
}

sim_bayes(p=0.6,N=90)

The result is a plot of posterior (which become the new prior) distributions as we make more and more observations from a Bernoulli process.

With each new observation, the posterior distribution is updated according to Bayes rule. You can change p to see how belief changes for low, or high probability outcomes, and N for to see how belief about p asymptotes to the true value after many observations.

Real-time data collection and analysis in class

As September draws nearer, my mind inevitably turns away from my lofty (and largely unmet) summer research goals, and toward teaching.  This semester I will be trying out a teaching technique using live data collection and analysis as a tool to encourage student engagement.  The idea is based on the electronic polling technology known as ‘clickers‘. The technology allows you to get instant feedback from students, check for understanding, and when used appropriately it can facilitate active engagement and peer learning.

Because I will be teaching in a computer lab, where all of the students will be sitting at a computer, I have the advantage of being able to bypass the little devices, and instead gather student responses using a web based interface.  The advantages, as I see them, are:

  1. Students can enter more complex input than the 1-9 provided by clickers. Instead, students can enter any number or character vector response.
  2.  Students can instantly download, plot, and analyze the class data.  This step is facilitated by the read.csv("http://data_url.csv") function in R, which allows data import directly from the web.

The first exercise I have planned using this technology is to have students enter their height, then have them plot a histogram of the data to introduce the normal distribution.  Using the simple online interface I have created, this exercise can be done very quickly. I am calling the tool I am one of n.

If you have any suggestions for learning activities that could make effective use of this technology in an undergraduate Biostatistics (or other) course, drop me a note!

Using simulation to demonstrate theory: Hardy-Weinberg Equilibrium

One of my teaching roles is in an introductory Genetics course, where first year students are presented with a wide range of new ideas at a relatively fast pace.  It seems that often, students choose to take a memorization approach to learning the material, rather than taking the chance to think about how and why these genetic concepts actually work.  It is my conviction that, as teachers, it is our role to provide students with the opportunities to engage with the course material, and construct a solid understanding that will serve them as they proceed on to higher specialization.

When it comes to bang for my pedagogical buck, I have found that you really can’t beat the use of simulation as a platform for providing the opportunity for students to engage with theoretical concepts.  Here is an R script which I have written and used to allow students to explore how random mating in a population leads to the well known Hardy-Weinberg (HW) distribution.

For those who need a refresher, HW describes the genotype frequencies in randomly mating population. For the simple two allele case (A >> a), the frequencies are denoted by p and q; freq(A) = p; freq(a) = q; p + q = 1. If the population is in equilibrium, then freq(AA) = p2 for the AA homozygotes in the population, freq(aa) = q2 for the aa homozygotes, and freq(Aa) = 2pq for the heterozygotes.

What doesn’t usually get mentioned in introductory courses, is that the HW formula provides the expected frequencies of each genotype.  Of course, in real, finite populations, there will be variability around these values.  The seeming exactness of HW obscures the random processes at play.  To help students see how HW arises in finite populations (as opposed to the theoretical infinite populations required for the strict solution), I let them play with this simulation (R script).

Students can play around with the population size (N) and the number of generations (num_generations), to see how well the simulated populations correspond to the predicted HW.  Here is a plot of 200 simulated populations of size N=200, which are initiated out of the HW equilibrium and then randomly mated for one generation:

Feel free to try it out in your own class!

-BayesianBiologist