R | bayesianbiologist

The recent Hack/Reduce hackathon in Montreal was a tonne of fun. Our team tackled a data set of consisting of Bixi (Montreal’s bicycle share system) station states at one minute temporal resolution. We used Hadoop and mapreduce to pull out some features of user behaviours. One of the things we extracted was the flux at each station, which we defined as the number of bikes arriving and departing from a given station per unit time. When you plot the total system flux across all stations against time, you can see the pulse of the city. Here are the first few weeks of this year’s Bixi season.(click to enlarge)

A few things jump out: 1) There are clearly defined peaks at both the morning and evening rush hours, but it looks like the evening rush is typically a little stronger. I guess cycling home is a great way to relax after a day at work. 2) The data collector seems to have gone offline in the night on April 18th. 3) Related to the first point, weekdays and weekends have distinct signatures. In fact, you can see a clear signal of Easter Monday, in that it looks like a weekend day. (click to enlarge)

When the system was first being installed, I had the impression that it would be used primarily by tourists. Owning a bike myself, I figured that if other Montrealers wanted to cycle in the city, that they would do so with their own rides. From this data, it really seems as though Montrealers themselves are using the Bixi system, substituting alternative modes of transit for commuting.

We also took the spatial information in the data and plotted the flux at the site level, then animated this across time. Here, I used a kernel smoother from the KernSmooth package to estimate the flux density in space. This allows us to be able to see the spatial configuration of flux a little better than with points, as the spatial density of stations is heterogeneous. The result is this pulsating video:

For the R users out there, I also found the package lubridate to be extremely helpful for wrangling the dates in this project.

Credits (Team Ctr-Freak)

Julia Evans
Kamal Marhubi
Victor Parmar
Pierre-Alexandre Lacerte
Mansoor Siddiqui
Rafik Draoui
Corey Chivers

## Simulate Bayesian Binomial updating sim_bayes<-function(p=0.5,N=10,y_lim=15) { success<-0 curve(dbeta(x,1,1),xlim=c(0,1),ylim=c(0,y_lim),xlab='p',ylab='Posterior Density',lty=2) legend('topright',legend=c('Prior','Updated Posteriors','Final Posterior'),lty=c(2,1,1),col=c('black','black','red')) for(i in 1:N) { if(runif(1,0,1)<=p) success<-success+1 curve(dbeta(x,success+1,(i-success)+1),add=TRUE) print(paste(success,"successes and ",i-success," failures")) } curve(dbeta(x,success+1,(i-success)+1),add=TRUE,col='red',lwd=1.5) } sim_bayes(p=0.6,N=90)

bayesianbiologist

Corey Chivers on P(A|B) ∝P(B|A)P(A)

Tag Archives: R

Heartbeat of a Cycling City: Bixi data at Hack/Reduce

Visualizing Bayesian Updating