Introduction to Machine Learning Talk

There was an amazing turnout at last night’s DataPhilly meetup (~200 people!). I was completely delighted by the turnout and people’s engagement level. Here are the slides of the talk I gave to set up the evening with a high-level introduction to machine learning.


Introducing Penn Signals at DataPhilly

Last week I had the pleasure of giving a talk to a great audience at DataPhilly about the Data Science mission at Penn Medicine. In the talk I introduced the framework we are building to accelerate the development and deployment of predictive applications in health care.


Click for slides (pdf)

Also on the line-up was (sometimes contributor to bayesianbiologist) Matt Sunquist. He demo’d some of‘s most recent features to audible gasps of delight for the audience.

Time-series forecasting: Bike Accidents

About a year ago I posted this video visualization of all the reported accidents involving bicycles in Montreal between 2006 and 2010. In the process I also calculated and plotted the accident rate using a monthly moving average. The results followed a pattern that was for the most part to be expected. The rate shoots up in the spring, and declines to only a handful during the winter months.

It’s now 2013 and unfortunately our data ends in 2010. However, the pattern does seem to be quite regular (that is, exhibits annual periodicity) so I decided to have a go at forecasting the time series for the missing years. I used a seasonal decomposition of time series by LOESS to accomplish this.

You can see the code on github but here are the results. First, I looked at the four components of the decomposition:


Indeed the seasonal component is quite regular and does contain the intriguing dip in the middle of the summer that I mentioned in the first post.



This figure shows just the seasonal deviation from the average rates. The peaks seem to be early July and again in late September. Before doing any seasonal aggregation I thought that the mid-summer dip may correspond with the mid-August construction holiday, however it looks now like it is a broader summer-long reprieve. It could be a population wide vacation effect.

Finally, I used an exponential smoothing model to project the accident rates into the 2011-2013 seasons.


It would be great to get the data from these years to validate the forecast, but for now lets just hope that we’re not pushing up against those upper confidence bounds.

From Whale Calls to Dark Matter: Competitive Data Science with R and Python

Back in June I gave a fun talk at Montreal Python on some of my dabbling in the competitive data science scene. The good people at Savior-fair Linux recorded the talk and have edited it all together into a pretty slick video. If you can spare twenty-minutes or so, have a look.

If you want the slides, head on over to my speakerdeck page.


Montreal R User Group meetup Nov. 14th

After a bit of a summer lull, the Montreal R User Group is meeting up again! We’re trying out a new venue this time. Notman House is the home of the web in Montreal. They hold hackathons and other tech user group meetups, and they are all around great people in an all around great space in downtown Montreal.

Our meetup will feature R super-user Etienne Low-Decarie, who will give a walk through of some of the most powerful packages in R, many of which were built by rstats rock star Hadley Wickham.

I will also kick off the meetup with a short session on how R is revolutionizing data science in academia, journalism, business and beyond.

  • November 14th, 7pm at 51 Sherbrooke W.
  • BYOL&D (Bring Your Own Laptop & Data)

Don’t forget to RSVP. Hope to see you there!