This week I spoke at DataJawn, an super fun evening of talks and mingling with Philly’s data nerds.
You can have a look through the slides here.
Lately I’ve been thinking a lot about the connection between prediction models and the decisions that they influence. There is a lot of theory around this, but communicating how the various pieces all fit together with the folks who will use and be impacted by these decisions can be challenging.
One of the important conceptual pieces is the link between the decision threshold (how high does the score need to be to predict positive) and the resulting distribution of outcomes (true positives, false positives, true negatives and false negatives). As a starting point, I’ve built this interactive tool for exploring this.
The idea is to take a validation sample of predictions from a model and experiment with the consequences of varying the decision threshold. The hope is that the user will be able to develop an intuition around the tradeoffs involved by seeing the link to the individual data points involved.
Code for this experiment is available here. I hope to continue to build on this with other interactive, visual tools aimed at demystifying the concepts at the interface between predictions and decisions.
If you build and/or use classifiers in your life, feel free to print this out and keep it above you desk.
Yesterday, I wrote about Generative Adversarial Networks being all the rage at NIPS this year. I created a toy model using Tensorflow to wrap my head around how the idea works. Building on that example, I created a video to visualize the adversarial training process.
The top left panel shows samples from both the training and generated (eg counterfeit) data. Remember that the goal is to have the generator produce samples that the discriminator can not distinguish from the real (training) data. Top right shows the predicted energy function from the discriminator. The bottom row shows the loss function for the discriminator (D) and generator (G).
I don’t fully understand why the dynamics of the adversarial training process are transiently unstable, but it seems to work overall. Another interesting observation is that the loss seems to continue to fall overall, even as it goes though the transient phases of instability when the fit of the generated data is qualitatively poor.
Another great turnout at the DataPhilly meetup last night. Was great to see all you random data nerds!
Code snippets to generate animated examples here.
The Annual Conference on Neural Information Processing Systems (NIPS) has recently listed this year’s accepted papers. There are 403 paper titles listed, which made for great morning coffee reading, trying to pick out the ones that most interest me.
Being a machine learning conference, it’s only reasonable that we apply a little machine learning to this (decidedly _small_) data.
Building off of the great example code in a post by Jordan Barber on Latent Dirichlet Allocation (LDA) with Python, I scraped the paper titles and built an LDA topic model with 5 topics. All of the code to reproduce this post is available on github. Here are the top 10 most probable words from each of the derived topics:
Normally, we might try to attach some kind of label to each topic using our beefy human brains and subject matter expertise, but I didn’t bother with this — nothing too obvious stuck out at me. If you think that you have appropriate names for them feel free to let me know. Given that we are only working with the titles (no abstracts or full paper text), I think that there aren’t obvious human-interpretable topics jumping out. But let’s not let that stop us from proceeding.
Since we are modeling the paper title generating process as a probability distribution of topics, each of which is a probability distribution of words, we can use this generating process to suggest keywords for each title. These keywords may or may not show up in the title itself. Here are some from the first 10 titles:
================ Double or Nothing: Multiplicative Incentive Mechanisms for Crowdsourcing Generated Keywords: [u'iteration', u'inference', u'theory'] ================ Learning with Symmetric Label Noise: The Importance of Being Unhinged Generated Keywords: [u'uncertainty', u'randomized', u'neural'] ================ Algorithmic Stability and Uniform Generalization Generated Keywords: [u'spatial', u'robust', u'dimensional'] ================ Adaptive Low-Complexity Sequential Inference for Dirichlet Process Mixture Models Generated Keywords: [u'rates', u'fast', u'based'] ================ Covariance-Controlled Adaptive Langevin Thermostat for Large-Scale Bayesian Sampling Generated Keywords: [u'monte', u'neural', u'stochastic'] ================ Robust Portfolio Optimization Generated Keywords: [u'learning', u'online', u'matrix'] ================ Logarithmic Time Online Multiclass prediction Generated Keywords: [u'complexity', u'problems', u'stein'] ================ Planar Ultrametric Rounding for Image Segmentation Generated Keywords: [u'deep', u'graphs', u'neural'] ================ Expressing an Image Stream with a Sequence of Natural Sentences Generated Keywords: [u'latent', u'process', u'stochastic'] ================ Parallel Correlation Clustering on Big Graphs Generated Keywords: [u'robust', u'learning', u'learning']
While some titles are strongly associated with a single topic, others seem to be generated from more even distributions over topics than others. Paper titles with more equal representation over topics could be considered to be, in some way, more interdisciplinary, or at least, intertopicular (yes, I just made that word up). To find these papers, we’ll find which paper titles have the highest information entropy in their inferred topic distribution.
Here are the top 10 along with their associated entropies:
1.22769364291 Where are they looking? 1.1794725784 Bayesian dark knowledge 1.11261338284 Stochastic Variational Information Maximisation 1.06836891546 Variational inference with copula augmentation 1.06224431711 Adaptive Stochastic Optimization: From Sets to Paths 1.04994413148 The Population Posterior and Bayesian Inference on Streams 1.01801236048 Revenue Optimization against Strategic Buyers 1.01652797194 Fast Convergence of Regularized Learning in Games 0.993789478925 Communication Complexity of Distributed Convex Learning and Optimization 0.990764728084 Local Expectation Gradients for Doubly Stochastic Variational Inference
So it looks like by this method, the ‘Where are they looking’ has the highest entropy as a result of topic uncertainty, more than any real multi-topic content.