I was recently fortunate to be invited to speak with an impressive group of high-school students as a part of the Germination Project. They came to Penn to learn about innovation in health care and I spoke with them about how we’re using Data Science to improve patient outcomes.
The theme for this year’s workshop will be “Moving beyond supervised learning in healthcare”. This will be a great forum for those who work on computational solutions to the challenges facing clinical medicine. The submission deadline is Friday Oct 26, 2018. Hope to see you there!
Lately I’ve been thinking a lot about the connection between prediction models and the decisions that they influence. There is a lot of theory around this, but communicating how the various pieces all fit together with the folks who will use and be impacted by these decisions can be challenging.
One of the important conceptual pieces is the link between the decision threshold (how high does the score need to be to predict positive) and the resulting distribution of outcomes (true positives, false positives, true negatives and false negatives). As a starting point, I’ve built this interactive tool for exploring this.
The idea is to take a validation sample of predictions from a model and experiment with the consequences of varying the decision threshold. The hope is that the user will be able to develop an intuition around the tradeoffs involved by seeing the link to the individual data points involved.
Code for this experiment is available here. I hope to continue to build on this with other interactive, visual tools aimed at demystifying the concepts at the interface between predictions and decisions.
If you build and/or use classifiers in your life, feel free to print this out and keep it above you desk.