The Treachery of Models

In Magritte’s famous 1929 painting The Treachery of Images, a pipe is depicted with the caption “Ceci n’est pas une pipe“, French for “This is not a pipe”. The seemingly dissonant statement under what is a very clearly depicted pipe forces the viewer to confront the distinction between the representation and the thing itself. The treachery refers to the danger involved in confusing the representation with the object.

Models – like images – are merely representations of the objects they depict. The modeler seeks to represent the relevant properties of the system being modeled in order to communicate some features of that system, or to manipulate it under ‘what-if’ scenarios.

With the current plethora of models of COVID-19 spread, it seems important to remind ourselves of Magritte’s warning. Models can be incredible tools — abstractions of complex realities that allow us to reason about the data we’ve seen so far, and consider possible futures. But they are not the epidemic process itself.

Ceci n'est pas une épidémie.

This is not an argument that the models themselves are treacherous. I absolutely adore modeling and models, as Magritte adored painting and images. I consider modeling an essential tool to further our understanding of (and make predictions about) the world around us. Indeed I developed one of the aforementioned COVID-19 models with my colleagues at Penn Medicine. The treachery comes when we mistake the abstraction for reality.

Eigenvectors from Eigenvalues – a NumPy implementation

I was intrigued by the recent splashy result showing how eigenvectors can be computed from eigenvalues alone. The finding was covered in Quanta magazine and the original paper is pretty easy to understand, even for a non-mathematician.

Being a non-mathematician myself, I tend to look for insights and understanding via computation, rather than strict proofs. What seems cool about the result to me is that you can compute the directions from simply the stretches (along with the stretches of the sub-matrices). It seems kind of magical (of course, it’s not 😉 ). To get a feel for it, I implemented the key identity in the paper in python and NumPy and confirmed that it gives the right answer for a random (real-valued, symmetric) matrix.

I posted the Jupyter Notebook here.

Machine Learning for Health #NIPS2018 workshop call for proposals

The theme for this year’s workshop will be “Moving beyond supervised learning in healthcare”. This will be a great forum for those who work on computational solutions to the challenges facing clinical medicine. The submission deadline is Friday Oct 26, 2018. Hope to see you there!

https://ml4health.github.io/2018/pages/call-for-papers.html

Visualizing classifier thresholds

Lately I’ve been thinking a lot about the connection between prediction models and the decisions that they influence. There is a lot of theory around this, but communicating how the various pieces all fit together with the folks who will use and be impacted by these decisions can be challenging.

One of the important conceptual pieces is the link between the decision threshold (how high does the score need to be to predict positive) and the resulting distribution of outcomes (true positives, false positives, true negatives and false negatives). As a starting point, I’ve built this interactive tool for exploring this.

Screen Shot 2017-11-13 at 11.16.26 AM

The idea is to take a validation sample of predictions from a model and experiment with the consequences of varying the decision threshold. The hope is that the user will be able to develop an intuition around the tradeoffs involved by seeing the link to the individual data points involved.

Code for this experiment is available here. I hope to continue to build on this with other interactive, visual tools aimed at demystifying the concepts at the interface between predictions and decisions.

Visualizing Generative Adversarial Networks

UPDATE: Some cool people at Georgia Tech and Google Brain have developed an interactive visualization called GAN lab which is way more exciting than this which you can check out here: https://poloclub.github.io/ganlab/

Yesterday, I wrote about Generative Adversarial Networks being all the rage at NIPS this year. I created a toy model using Tensorflow to wrap my head around how the idea works. Building on that example, I created a video to visualize the adversarial training process.

The top left panel shows samples from both the training and generated (eg counterfeit) data. Remember that the goal is to have the generator produce samples that the discriminator can not distinguish from the real (training) data. Top right shows the predicted energy function from the discriminator.  The bottom row shows the loss function for the discriminator (D) and generator (G).

I don’t fully understand why the dynamics of the adversarial training process are transiently unstable, but it seems to work overall. Another interesting observation is that the loss seems to continue to fall overall, even as it goes though the transient phases of instability when the fit of the generated data is qualitatively poor.

Generative Adversarial Networks are the hotness at NIPS 2016

While they hit the scene two years ago, Generative Adversarial Networks (GANs) have become the darlings of this year’s NIPS conference. The term “Generative Adversarial” appears 170 times in the conference program. So far I’ve seen talks demonstrating their utility in everything from generating realistic images, predicting and filling in missing video segments, rooms, maps, and objects of various sorts. They are even being applied to the world of high energy particle physics, pushing the state of the art of inference within the language of quantum field theory.

The basic idea is to build two models and to pit them against each other (hence the adversarial part). The generative model takes random inputs and tries to generate output data that “look like” real data. The discriminative model takes as input data from both the generative model and real data and tries to correctly distinguish between them. By updating each model in turn iteratively, we hope to reach an equilibrium where neither the discriminator nor the generator can improve. At this point the generator is doing it’s best to fool the discriminator, and the discriminator is doing it’s best not to be fooled. The result (if everything goes well) is a generative model which, given some random inputs, will output data which appears to be a plausible sample from your dataset (eg cat faces).

As with any concept that I’m trying to wrap my head around, I took a moment to create a toy example of a GAN to try to get a feel for what is going on.

Let’s start with a simple distribution from which to draw our “real” data from.

screen-shot-2016-12-07-at-1-55-45-pmreal_data_gan

Next, we’ll create our generator and discriminator networks using tensorflow. Each will be a three layer, fully connected network with relu’s in the hidden layers. The loss function for the generative model is -1(loss function of discriminative). This is the adversarial part. The generator does better as the discriminator does worse. I’ve put the code for building this toy example here.

Next, we’ll fit each model in turn. Note in the code that we gave each optimizer a list of variables to update via gradient descent. This is because we don’t want to update the weights of the discriminator while we’re updating the weights of the generator, and visa versa.

loss at step 0: discriminative: 11.650652, generative: -9.347455

gan1.png

loss at step 200: discriminative: 8.815780, generative: -9.117246

gan2

loss at step 400: discriminative: 8.826855, generative: -9.462300

gan3.png

loss at step 600: discriminative: 8.893397, generative: -9.835464

gan4.png

loss at step 3600: discriminative: 6.724183, generative: -13.005814
 gan30.png
As we can see, the generator is learning to output data that looks more and more like a sample from the training data. At the same time, the discriminator is having a harder and harder dime telling them apart (as seen in the overlapping prediction histograms on the right).
Obviously this is a trivial example to put a GAN to work on, but when it comes to high-dimensional data with complex dependency structures, this approach starts to really shine. I’m sure the hotness of this approach won’t cool off any time soon.
All of the code for generating this GAN is available on github.

Weapons of Math Destruction – A Data Scientist’s Guide to Disarmament

I’ve had this book on pre-order since spring and it finally arrived on Friday. I subsequently devoured it over the weekend.

Long awaited Weapons of Math Destruction by Cathy O'Neil

The book lays out a clear and compelling case for how data-driven algorithms can become — in contrast to their promise of amoral objectivism — efficient means for reproducing and even exacerbating social inequalities and injustices. From predictive policing and recidivism risk models to targeted marketing for predatory loans and for-profit universities, O’Neil explains how to recognize WMDs by 3 distinct features:

  1. The model is either hidden, or opaque to the individuals affected by its calculations, restricting any possibility of seeking recourse against – or understanding of – its results or conclusions.
  2. The model works against the subject’s interest (eg. it is unfair).
  3. The model scales, giving it the opportunity to negatively affect a very large segment of the population.

The taxonomy provides a simple framework for identifying WMDs in the wild. However, importantly for data scientists and other data practitioners, it forms a checklist (or rather an anti-checklist) to keep in mind when developing models that will be deployed into the real world. As data scientists, many of us are strongly incentivized to achieve feature 3, and doing so only makes it increasingly important to be constantly questioning the degree to which our models could fall victim to features 2 and 1.

Feature 2, as O’Neil lays out, can occur despite the best intentions of a model’s creators. This can (and does!) happen in two ways: First, when a modeler seeks to create an objective system for rating individuals (say, for acceptance to a prestigious university, or for a payday loan), the data used to build the model is already encoded with the socially constructed biases of the conditions under which it was generated. Even when attempting to exclude potentially bias-laden factors such as race or gender, this information seeps into the model nonetheless via correlations to seemingly benign variables such as zip codes or the makeup of a subject’s social connections.

Second, when the outcome of the model results in the reinforcement of the unjust conditions from which it was created, a negative feedback loop is created. Such a negative feedback loop is particularly present and pernicious in the use of recidivism risk models to guide sentencing decisions. An individual may be labeled as high risk due not to qualities of the individual himself, but his circumstances of living in a poor, high crime neighborhood. Being incarcerated based on the results of this model renders him more likely to end up back in that neighborhood, subject to continued poverty and disproportionate policing. Thus the model has set up the conditions to fulfill its own prediction.

As machine learning algorithms become more and more accurate at a variety of tasks, their inner workings become harder and harder to understand. The trend will make it increasingly difficult to avoid feature 1 of the WMD taxonomy. Current advanced techniques like deep learning are creating models that are remarkably performant, yet not fully understood by the researchers creating them, much less the individuals affected by their results. In light of this, we need to think carefully as data scientists about how to communicate these models with as much transparency as possible. How to do so remains an open question. But the internal ‘black box’ nature of these algorithms does not obviate our responsibility to disclose exactly what input data went into a given model, what assumptions were made of that data, and on what criteria the model was trained.

Overall, WMD provides an incredibly important framework for thinking about the consequences of uncritically applying data and algorithms to people’s lives. For those of us, like O’Neil herself, who make our living using mathematics to create data-driven algorithms, taking to heart the lessons contained in Weapons Of Math Destruction will be our best defense against unwittingly creating the bomb ourselves.