Variable probability Bernoulli outcomes – Fast and Slow

I am working on a project that requires the generation of Bernoulli outcomes. Typically, I would go about this using the built in sample() function like so:

sample(1:0,n,prob=c(p,1-p),replace=TRUE)

This works great and is fast, even for large n. Problem is, I want to generate each sample with its own unique probability. Seems straight forward enough, I just wrapped the function and vectorized to allow the passing of a vector of p.

binomial_sampler<-function(p){
  return(sample(1:0,1,prob=c(p,1-p)))
}
bs<-Vectorize(binomial_sampler)

Naming this function bs() turned out to be rather prophetic. Nevertheless, I can call this function by passing my unique vector of outcome probabilities. And indeed I get the result I’m looking for.

bs(my_p_vec)

Problem is, this turns out to be very slow. It would seem that there is quite a bit of overhead to calling sample() for one sample at a time. R’s RNGs are very fast for generating many iid samples, so I started thinking like my old c++ programming self and tried a different approach.

Nbs<-function(p)
{
  U<-runif(length(p),0,1)
  outcomes<-U<p
  return(outcomes)
}

I call the new version Nbs for “New Bernoulli Sampler”, or “Not Bull Shit”. And what a difference it made indeed!

library(rbenchmark)
p<-runif(1000)
res <- benchmark(bs(p), Nbs(p))
print(res)
test replications elapsed relative user.self sys.self user.child sys.child
2 Nbs(p)          100   0.007        1     0.008    0.000          0         0
1  bs(p)          100   1.099      157     1.080    0.016          0         0

157x faster! Now that’s a speedup to write home about.

Dan “The Man” Bernoulli

6 thoughts on “Variable probability Bernoulli outcomes – Fast and Slow”

Just for kicks:

require(parallel)
require(doMC)
registerDoMC(2)
require(plyr)

bsply<-function(p){
laply(.data=p,
function(p)sample(1:0,1,prob=c(p,1-p)),
.parallel=T)}

Sadly super slow:

test replications elapsed relative user.self sys.self user.child sys.child
2 Nbs(p) 100 0.006 1 0.005 0.000 0.000 0.000
1 bsply(p) 100 93.930 15655 66.293 5.631 7.975 12.794

This is a little bit slower than your proposed alternative but it seems natural to me to think of rbinom(length(p), 1, p) as another alternative.

stevencarlislewalker says:

November 2, 2012 at 3:24 pm

This is what I would have done too. This just makes Corey’s function even more interesting because it seems faster than what I would consider the standard base R thing (i.e. using rbinom).

Reply

The pedant in me can’t leave it alone…

Surely Nbs should be
{
…
return (outcomes)
}

Or maybe even
{
…
outcomes <- ifelse(U < p, 1, 0)
return(outcomes)
}

Cheers.

Corey Chivers says:

November 1, 2012 at 11:44 pm

Yes, you’re totally right – that was a typo. Did I mention that this software comes with ABSOLUTELY NO WARRANTY. Thanks for the catch 😉

Reply

Pingback: Generate binary outcomes with varying probability - The DO Loop

bayesianbiologist

Corey Chivers on P(A|B) ∝P(B|A)P(A)

Variable probability Bernoulli outcomes – Fast and Slow

6 thoughts on “Variable probability Bernoulli outcomes – Fast and Slow”

Leave a comment Cancel reply

Share this:

Related

6 thoughts on “Variable probability Bernoulli outcomes – Fast and Slow”

Leave a comment Cancel reply