The Treachery of Models

In Magritte’s famous 1929 painting The Treachery of Images, a pipe is depicted with the caption “Ceci n’est pas une pipe“, French for “This is not a pipe”. The seemingly dissonant statement under what is a very clearly depicted pipe forces the viewer to confront the distinction between the representation and the thing itself. The treachery refers to the danger involved in confusing the representation with the object.

Models – like images – are merely representations of the objects they depict. The modeler seeks to represent the relevant properties of the system being modeled in order to communicate some features of that system, or to manipulate it under ‘what-if’ scenarios.

With the current plethora of models of COVID-19 spread, it seems important to remind ourselves of Magritte’s warning. Models can be incredible tools — abstractions of complex realities that allow us to reason about the data we’ve seen so far, and consider possible futures. But they are not the epidemic process itself.

Ceci n'est pas une épidémie.

This is not an argument that the models themselves are treacherous. I absolutely adore modeling and models, as Magritte adored painting and images. I consider modeling an essential tool to further our understanding of (and make predictions about) the world around us. Indeed I developed one of the aforementioned COVID-19 models with my colleagues at Penn Medicine. The treachery comes when we mistake the abstraction for reality.

Advertisement

Weapons of Math Destruction – A Data Scientist’s Guide to Disarmament

I’ve had this book on pre-order since spring and it finally arrived on Friday. I subsequently devoured it over the weekend.

Long awaited Weapons of Math Destruction by Cathy O'Neil

The book lays out a clear and compelling case for how data-driven algorithms can become — in contrast to their promise of amoral objectivism — efficient means for reproducing and even exacerbating social inequalities and injustices. From predictive policing and recidivism risk models to targeted marketing for predatory loans and for-profit universities, O’Neil explains how to recognize WMDs by 3 distinct features:

  1. The model is either hidden, or opaque to the individuals affected by its calculations, restricting any possibility of seeking recourse against – or understanding of – its results or conclusions.
  2. The model works against the subject’s interest (eg. it is unfair).
  3. The model scales, giving it the opportunity to negatively affect a very large segment of the population.

The taxonomy provides a simple framework for identifying WMDs in the wild. However, importantly for data scientists and other data practitioners, it forms a checklist (or rather an anti-checklist) to keep in mind when developing models that will be deployed into the real world. As data scientists, many of us are strongly incentivized to achieve feature 3, and doing so only makes it increasingly important to be constantly questioning the degree to which our models could fall victim to features 2 and 1.

Feature 2, as O’Neil lays out, can occur despite the best intentions of a model’s creators. This can (and does!) happen in two ways: First, when a modeler seeks to create an objective system for rating individuals (say, for acceptance to a prestigious university, or for a payday loan), the data used to build the model is already encoded with the socially constructed biases of the conditions under which it was generated. Even when attempting to exclude potentially bias-laden factors such as race or gender, this information seeps into the model nonetheless via correlations to seemingly benign variables such as zip codes or the makeup of a subject’s social connections.

Second, when the outcome of the model results in the reinforcement of the unjust conditions from which it was created, a negative feedback loop is created. Such a negative feedback loop is particularly present and pernicious in the use of recidivism risk models to guide sentencing decisions. An individual may be labeled as high risk due not to qualities of the individual himself, but his circumstances of living in a poor, high crime neighborhood. Being incarcerated based on the results of this model renders him more likely to end up back in that neighborhood, subject to continued poverty and disproportionate policing. Thus the model has set up the conditions to fulfill its own prediction.

As machine learning algorithms become more and more accurate at a variety of tasks, their inner workings become harder and harder to understand. The trend will make it increasingly difficult to avoid feature 1 of the WMD taxonomy. Current advanced techniques like deep learning are creating models that are remarkably performant, yet not fully understood by the researchers creating them, much less the individuals affected by their results. In light of this, we need to think carefully as data scientists about how to communicate these models with as much transparency as possible. How to do so remains an open question. But the internal ‘black box’ nature of these algorithms does not obviate our responsibility to disclose exactly what input data went into a given model, what assumptions were made of that data, and on what criteria the model was trained.

Overall, WMD provides an incredibly important framework for thinking about the consequences of uncritically applying data and algorithms to people’s lives. For those of us, like O’Neil herself, who make our living using mathematics to create data-driven algorithms, taking to heart the lessons contained in Weapons Of Math Destruction will be our best defense against unwittingly creating the bomb ourselves.

Montreal R Workshop: Introduction to Bayesian Methods

Monday, March 26, 2012  14h-16h, Stewart Biology N4/17

Corey Chivers, Department of Biology McGill University

This is a meetup of the Montreal R User Group. Be sure to join the group and RSVP. More information about the workshop here.

Topics

Why would we want to be Bayesian in the first place?  In this workshop we will examine the types of questions which we are able to ask when we view the world through a Bayesian perspective.This workshop will introduce Bayesian approaches to both statistical inference and model based prediction/forecasting.  By starting with an examination of the theory behind this school of statistics through a simple example, the participant will then learn why we often need computationally intensive methods for solving Bayesian problems.  The participant will also be introduced to the mechanics behind these methods (MCMC), and will apply them in a biologically relevant example.

Learning Objectives

The participant will:
1) Contrast the underlying philosophies of the Frequentist and Bayesian perspectives.

2)
Estimate posterior distributions using Markov Chain Monte Carlo (MCMC).

3)
Conduct both inference and prediction using the posterior distribution.

Prerequisites

We will build on ideas presented in the workshop on Likelihood Methods.  If you did not attend this workshop, it may help to have a look at the slides and script provided on this page.

The goal of this workshop is to demystify the potentially ‘scary‘ topic of Bayesian Statistics, and empower participants (of any preexisting knowledge level) to engage in statistical reasoning when conducting their own research.  So come one, come all!

That being said, a basic working understanding of R is assumed.  Knowledge of functions and loops in R will be advantageous, but not a must.

Packages

This workshop will be conducted entirely in R.  We will not be using any external software such as winBUGS.

We will use a package I have written which is available on CRAN:
http://cran.r-project.org/web/packages/MHadaptive/

install.packages(“MHadaptive”)