Open Data Exchange 2013, April 6. Montreal

UPDATE: The day was great! There are many people doing really amazing things with open data and it was amazing to meet them. Here are my slides from the panel talk.
Next Saturday, I’ll be sitting on a panel discussing future avenues for open data at ODX13.
From the odx13 site:

Odx13 is a mini-conference to discuss the successes and challenges of extracting value from Open Data for civic engagement, international aid transparency, scientific research, and more!


Morning Session – Open Data Stories; Panel Discussions

9:00 AM    Introduction and Welcome

9:15 AM    Winning with Open Data – Panel 1

10:10 AM    Les données ouvertes en action - Panel 2 (en français)

  • Guillaume Ducharme, gestionnaire dans le réseau de la santé et membre du collectif Démocratie Ouverte
  • Sébastien Pierre, fondateur, FFunction & Montréal Ouvert
  • Josée Plamondon, co-conceptrice, ContratsNet
  • Jean-Noé Landry (l’animateur de discussion), fondateur, Montréal Ouvert et Québec Ouvert

11:05 AM    Future Avenues for Open Data – Panel 3

12:00 PM    Lunch will be provided

Afternoon Session – Digging into Data; Workshop and Lightning Talks

1:00 PM    Data Dive Intro – Exploratory Data Analysis with Trudat

1:30 PM    Data Dive

We will dive into interesting Open Data sets with experts on hand to guide us through the weeds, including data on

  • International Aid
  • Government contracts
  • Biodiversity
  • and more…

3:00 PM   Lightning Talks

4:00 PM    Present data insights

4:45 PM    Closing remarks

Mapping Bike Accidents in R

At last weekend’s Hack Ta Ville event here in Montreal, I joined up with some talented urban planners and web devs to realize Vélobstacles. The idea of the project is to crowd source information on cycling conditions around the city. As with any crowd sourcing project, we were faced with the problem of seeding the site with some data to draw the attention of users to get the ball rolling.

Fortunately, we had access to a data set of all reported cycling accidents between 2006-2010. Once we seeded Vélobstacles with this data, the web devs went to town adding features to the site, and I had outlived my usefulness as a data geek. So I decided to play with the accident data a little and produce some visualization. I plotted all the accidents on a map and animated it through time. I also calculated and plotted the monthly accident rate using a moving average.

Be sure to select HD quality:

Not surprisingly, the accident rate goes way up in the summer months as Montreal winters are braved on two wheels by only a rarefied few. What is interesting is the mid-summer dip in the accident rate. This dip is notably correlated with Montreal’s much beloved construction holiday – though the causal relationship is unclear. If you have any alternative explanations, or an idea about how to test the construction holiday hypothesis, drop a note in the comments.

As always, you can get the code on my github page.

R Workshop: Introducing Slidify – HTML5 slides from R markdown

Thursday, June 28th, 2012  19h. <–  new evening time!

Tomson House: 650 McTavish, H3A 1Y2, Montréal, QC <– new social setting!

guRu: Ramnath Vaidyanathan (McGill University)

Ramnath Vaidyanathan will introduce the group to slidify, his brand new R package.

From the slidify website:

“The objective of slidify is to make it easy to create reproducible HTML5 presentations from .Rmd files. The guiding philosophy of slidify is to completely separate writing of content from its rendering, so that content can be written once in R Markdown, and rendered as an HTML5 presentation using any of the HTML5 slide frameworks supported.”

The package is currently in alpha and therefore not yet available on cran. You can find install instructions here.

Ramnath Vaidyanathan is Assistant Professor of Operations Management at McGill University’s Desautels Faculty of Management.

This is a meeting of the Montreal R Users Group. We’re open to everyone! Sign up to RSVP!

More Bixi Data Visualization

I mentioned in a previous post that our team at the recent Hack/Reduce hackathon had some fun with a data set which consisted of Bixi station states at minute level temporal resolution. In addition to pulling out and plotting the flux at each station on an hourly basis, we also plotted the system state (number of bikes at each station) at each time-step we had. This totalled to 24,217 individual plots. Each plot was generated using an R script which took in the system state at each time-step, and output a png.

Team member Kamal Marhubi also did some nice post-processing to overlay the information on a map. The results are a little mesmerising. Things don’t get fun until about 40s into the video, as the first part mostly just shows the stations coming online for the first part of the season.

And for the non-Montrealers out there, here’s an image of a Bixi bike; our durable, data generating little hero.

Heartbeat of a Cycling City: Update

I recently posted about some Bixi data our group analysed at the Hack/Reduce Montreal 2 event. One of the observations I made was that it seemed as though the evening rush was generally stronger than the morning rush. This seemed to be true at least for the week of April 11th to 14th. I even speculated that this was because riding home might be a great way to relax after work. Reader Joey Berger contacted me with an alternative take on this:

You were surprised (as I was) by this, in part I think because downtown is downhill for a lot of Bixi users. When I used to commute downtown I always rode in the morning and took the bus in the evening.
Anyway, I got curious so I looked up some Environment Canada data.

As you can see, there wasn’t much rain to affect bike use, but April mornings are a lot cooler than April evenings. I suspect two things.  First, below about 12 degrees, riding a Bixi isn’t as comfortable as it needs to be for mass use. Especially if it’s windy and you don’t have gloves on. Second, I assume there’s a lot more downtown traffic in the evening, especially among pedestrians/bikers who are both commuting from work and entering downtown for dinner, movies, etc.
Keep the comments coming and check back here for an analysis of Bixi traffic during the STM outage on Thursday morning.

Heartbeat of a Cycling City: Bixi data at Hack/Reduce

The recent Hack/Reduce hackathon in Montreal was a tonne of fun. Our team tackled a data set of consisting of Bixi (Montreal’s bicycle share system) station states at one minute temporal resolution. We used Hadoop and mapreduce to pull out some features of user behaviours. One of the things we extracted was the flux at each station, which we defined as the number of bikes arriving and departing from a given station per unit time. When you plot the total system flux across all stations against time, you can see the pulse of the city. Here are the first few weeks of this year’s Bixi season.(click to enlarge)

A few things jump out: 1) There are clearly defined peaks at both the morning and evening rush hours, but it looks like the evening rush is typically a little stronger. I guess cycling home is a great way to relax after a day at work. 2) The data collector seems to have gone offline in the night on April 18th. 3) Related to the first point, weekdays and weekends have distinct signatures. In fact, you can see a clear signal of Easter Monday, in that it looks like a weekend day. (click to enlarge)

When the system was first being installed, I had the impression that it would be used primarily by tourists. Owning a bike myself, I figured that if other Montrealers wanted to cycle in the city, that they would do so with their own rides. From this data, it really seems as though Montrealers themselves are using the Bixi system, substituting alternative modes of transit for commuting.

We also took the spatial information in the data and plotted the flux at the site level, then animated this across time. Here, I used a kernel smoother from the KernSmooth package to estimate the flux density in space. This allows us to be able to see the spatial configuration of flux a little better than with points, as the spatial density of stations is heterogeneous. The result is this pulsating video:

For the R users out there, I also found the package lubridate to be extremely helpful for wrangling the dates in this project.

Credits (Team Ctr-Freak)

Julia Evans
Kamal Marhubi
Victor Parmar
Pierre-Alexandre Lacerte
Mansoor Siddiqui
Rafik Draoui
Corey Chivers