Speaking at DataPhilly February 2016

The next DataPhilly meetup will feature a medley of machine-learning talks, including an Intro to ML from yours truly. Check out the speakers list and be sure to RSVP. Hope to see you there!

Thursday, February 18, 2016

6:00 PM to 9:00 PM

Speakers:

  • Corey Chivers
  • Randy Olson
  • Austin Rochford

Corey Chivers (Penn Medicine)

Abstract: Corey will present a brief introduction to machine learning. In his talk he will demystify what is often seen as a dark art. Corey will describe how we “teach” machines to learn patterns from examples by breaking the process into its easy-to-understand component parts. By using examples from fields as diverse as biology, health-care, astrophysics, and NBA basketball, Corey will show how data (both big and small) is used to teach machines to predict the future so we can make better decisions.

Bio: Corey Chivers is a Senior Data Scientist at Penn Medicine where he is building machine learning systems to improve patient outcomes by providing real-time predictive applications that empower clinicians to identify at risk individuals. When he’s not pouring over data, he’s likely to be found cycling around his adoptive city of Philadelphia or blogging about all things probability and data at bayesianbiologist.com.

Randy Olson (University of Pennsylvania Institute for Biomedical Informatics):

Automating data science through tree-based pipeline optimization

Abstract: Over the past decade, data science and machine learning has grown from a mysterious art form to a staple tool across a variety of fields in business, academia, and government. In this talk, I’m going to introduce the concept of tree-based pipeline optimization for automating one of the most tedious parts of machine learning — pipeline design. All of the work presented in this talk is based on the open source Tree-based Pipeline Optimization Tool (TPOT), which is available on GitHub at https://github.com/rhiever/tpot.

Bio: Randy Olson is an artificial intelligence researcher at the University of Pennsylvania Institute for Biomedical Informatics, where he develops state-of-the-art machine learning algorithms to solve biomedical problems. He regularly writes about his latest adventures in data science at RandalOlson.com/blog, and tweets about the latest data science news at http://twitter.com/randal_olson.

Austin Rochford (Monetate):

Abstract: Bayesian optimization is a technique for finding the extrema of functions which are expensive, difficult, or time-consuming to evaluate. It has many applications to optimizing the hyperparameters of machine learning models, optimizing the inputs to real-world experiments and processes, etc. This talk will introduce the Gaussian process approach to Bayesian optimization, with sample code in Python.

Bio: Austin Rochford is a Data Scientist at Monetate. He is a former mathematician who is interested in Bayesian nonparametrics, multilevel models, probabilistic programming, and efficient Bayesian computation.

A probabilistic justification to carpe diem

There’s a curious thing about unlikely independent events: no matter how rare, they’re most likely to happen right away.

Let’s get hypothetical

You’ve taken a bet that pays off if you guess the exact date of the next occurrence of a rare event (p = 0.0001 on any given day i.i.d). What day do you choose? In other words, what is the most likely day for this rare event to occur?

Setting aside for now why in the world you’ve taken such a silly sounding bet, it would seem as though a reasonable way to think about it would be to ask: what is the expected number of days until the event? That must be the best bet, right?

We can work out the expected number of days quite easily as 1/p = 10000. So using the logic of expectation, we would choose day 10000 as our bet.

Let’s simulate to see how often we would win with this strategy. We’ll simulate the outcomes by flipping a weighted coin until it comes out heads. We’ll do this 100,000 times and record how many flips it took each time.

p0001

The event occurred on day 10,000 exactly 35 times. However, if we look at a histogram of our simulation experiment, we can see that the time it took for the rare event to happen was more often short, than long. In fact, the event occurred 103 times on the very first flip (the most common Time to Event in our set)!

So from the experiment it would seem that the most likely amount of time to pass until the rare event occurs is 0. Maybe our hypothetical event was just not rare enough. Let’s try it again with p=0.0000001, or an event with a 1 in 1million chance of occurring each day.

p0000001

While now our event is extremely unlikely to occur, it’s still most likely to occur right away.

Existential Risk

What does this all have to do with seizing the day? Everything we do in a given day comes with some degree of risk. The Stanford professor Ronald A. Howard conceived of a way of measuring the riskiness of various day-to-day activities, which he termed the micromort. One micromort is a unit of risk equal to p = 0.000001 (1 in a million chance) of death. We are all subject to a baseline level of risk in micromorts, and additional activities may add or subtract from that level (skiing, for instance adds 0.7 micromorts per day).

While minimizing the risks we assume in our day-to-day lives can increase our expected life span, the most likely exact day of our demise is always our next one. So carpe diem!!

Post Script:

Don’t get too freaked out by all of this. It’s just a bit of fun that comes from viewing the problem in a very specific way. That is, as a question of which exact day is most likely. The much more natural way to view it is to ask, what is the relative probability of the unlikely event occurring tomorrow vs any other day but tomorrow. I leave it to the reader to confirm that for events with p < 0.5, the latter is always more likely.

Categorizing NIPS papers using LDA topic modeling

The Annual Conference on Neural Information Processing Systems (NIPS) has recently listed this year’s accepted papers. There are 403 paper titles listed, which made for great morning coffee reading, trying to pick out the ones that most interest me.

Being a machine learning conference, it’s only reasonable that we apply a little machine learning to this (decidedly _small_) data.

Building off of the great example code in a post by Jordan Barber on Latent Dirichlet Allocation (LDA) with Python, I scraped the paper titles and built an LDA topic model with 5 topics. All of the code to reproduce this post is available on github. Here are the top 10 most probable words from each of the derived topics:

0 1 2 3 4
0 learning learning optimization learning via
1 models inference networks bayesian models
2 neural sparse time sample inference
3 high models stochastic analysis networks
4 stochastic non model data deep
5 dimensional optimization convex inference learning
6 networks algorithms monte spectral fast
7 graphs multi carlo networks variational
8 optimal linear neural bandits neural
9 sampling convergence information methods convolutional

Normally, we might try to attach some kind of label to each topic using our beefy human brains and subject matter expertise, but I didn’t bother with this — nothing too obvious stuck out at me. If you think that you have appropriate names for them feel free to let me know. Given that we are only working with the titles (no abstracts or full paper text), I think that there aren’t obvious human-interpretable topics jumping out. But let’s not let that stop us from proceeding.

We can also represent the inferred topics with the much maligned, but handy-dandy wordcloud visualization:

Topic: 0
 topic_0
Topic: 1
 topic_1
Topic: 2
 topic_2
Topic: 3
 topic_3
Topic: 4
topic_4

Since we are modeling the paper title generating process as a probability distribution of topics, each of which is a probability distribution of words, we can use this generating process to suggest keywords for each title. These keywords may or may not show up in the title itself. Here are some from the first 10 titles:

================

Double or Nothing: Multiplicative Incentive Mechanisms for Crowdsourcing
Generated Keywords: [u'iteration', u'inference', u'theory']

================

Learning with Symmetric Label Noise: The Importance of Being Unhinged
Generated Keywords: [u'uncertainty', u'randomized', u'neural']

================

Algorithmic Stability and Uniform Generalization
Generated Keywords: [u'spatial', u'robust', u'dimensional']

================

Adaptive Low-Complexity Sequential Inference for Dirichlet Process Mixture Models
Generated Keywords: [u'rates', u'fast', u'based']

================

Covariance-Controlled Adaptive Langevin Thermostat for Large-Scale Bayesian Sampling
Generated Keywords: [u'monte', u'neural', u'stochastic']

================

Robust Portfolio Optimization
Generated Keywords: [u'learning', u'online', u'matrix']

================

Logarithmic Time Online Multiclass prediction
Generated Keywords: [u'complexity', u'problems', u'stein']

================

Planar Ultrametric Rounding for Image Segmentation
Generated Keywords: [u'deep', u'graphs', u'neural']

================

Expressing an Image Stream with a Sequence of Natural Sentences
Generated Keywords: [u'latent', u'process', u'stochastic']

================

Parallel Correlation Clustering on Big Graphs
Generated Keywords: [u'robust', u'learning', u'learning']

Entropy and the most “interdisciplinary” paper title

While some titles are strongly associated with a single topic, others seem to be generated from more even distributions over topics than others. Paper titles with more equal representation over topics could be considered to be, in some way, more interdisciplinary, or at least, intertopicular (yes, I just made that word up). To find these papers, we’ll find which paper titles have the highest information entropy in their inferred topic distribution.

Here are the top 10 along with their associated entropies:

1.22769364291 Where are they looking?
1.1794725784 Bayesian dark knowledge
1.11261338284 Stochastic Variational Information Maximisation
1.06836891546 Variational inference with copula augmentation
1.06224431711 Adaptive Stochastic Optimization: From Sets to Paths
1.04994413148 The Population Posterior and Bayesian Inference on Streams
1.01801236048 Revenue Optimization against Strategic Buyers
1.01652797194 Fast Convergence of Regularized Learning in Games
0.993789478925 Communication Complexity of Distributed Convex Learning and Optimization
0.990764728084 Local Expectation Gradients for Doubly Stochastic Variational Inference

So it looks like by this method, the ‘Where are they looking’ has the highest entropy as a result of topic uncertainty, more than any real multi-topic content.

What’s Warren Buffett’s $1 Billion Basketball Bet Worth?

A friend of mine just alerted me to a story on NPR describing a prize on offer from Warren Buffett and Quicken Loans. The prize is a billion dollars (1B USD) for correctly predicting all 63 games in the men’s Division I college basketball tournament this March. The facebook page announcing the contest puts the odds at 1:9,223,372,036,854,775,808, which they note “may vary depending upon the knowledge and skill of entrant”.

Being curious, I thought I’d see what the assumptions were that went into that number. It would make sense to start with the assumption that you don’t know a lick about college basketball and you just guess using a coin flip for every match-up. In this scenario you’re pretty bad, but you are no worse than random. If we take this assumption, we can calculate the odds as 1/(0.5)^63.  To get precision down to a whole integer I pulled out trusty bc for the heavy lifting:

$ echo "scale=50;  1/(0.5^63)" | bc
9223372036854775808.000000

Well, that was easy. So if you were to just guess randomly, your odds of winning the big prize would be those published on the contest page. We can easily calculate the expected value of entering the contest as P(win)*prize, or 9,223,372,036ths of a dollar (that’s 9 nano dollars, if you’re paying attention). You’ve literally already spent that (and then some) in opportunity cost sunk into the time you are spending thinking about this contest and reading this post (but read on, ’cause it’s fun!).

But of course, you’re cleverer than that. You know everything about college basketball – or more likely if you are reading this blog – you have a kickass predictive model that is going to up your game and get your hands into the pocket of the Oracle from Omaha.

What level of predictiveness would you need to make this bet worth while? Let’s have a look at the expected value as a function of our individual game probability of being correct.

buffet1

And if you think that you’re really good, we can look at the 0.75 to 0.85 range:

buffet2

So it’s starting to look enticing, you might even be willing to take off work for a while if you thought you could get your model up to a consistent 85% correct game predictions, giving you an expected return of ~$35,000. A recent paper found that even after observing the first 40 scoring events, the outcome of NBA games is only predictable at 80%. In order to be eligible to win, you’ve obviously got to submit your picks before the playoff games begin, but even at this herculean level of accuracy, the expected value of an entry in the contest plummets down to $785.

Those are the odds for an individual entrant, but what are the chances that Buffet and co will have to pay out? That, of course, depends on the number of entrants. Lets assume that the skill of all entrants is the same, though they all have unique models which make different predictions. In this case we can get the probability of at least one of them hitting it big. It will be the complement of no one winning. We already know the odds for a single entrant with a given level of accuracy, so we can just take the probability that each one doesn’t win, then take 1 minus that value.

buffet3

Just as we saw that the expected value is very sensitive to the predictive accuracy of the participant, so too is the probability that the prize will be awarded at all. If 1 million super talented sporting sages with 80%  game-level accuracy enter the contest, there will only be a slightly greater than 50% chance of anyone actually winning. If we substitute in a more reasonable (but let’s face it, still wildly high) figure for participants’ accuracy of 70%, the chance becomes only 1 in 5739  (0.017%) that the top prize will even be awarded even with a 1 million strong entrant pool.

tl;dr You’re not going to win, but you’re still going to play.

If you want to reproduce the numbers and plots in this post, check out this gist.

Simudidactic

auto·di·dact n.
A self-taught person.
From Greek autodidaktosself-taught : auto-auto- + didaktostaught;

+

sim·u·late v.
To create a representation or model of (a physical system or particular situation, for example).
From Latin simulre, simult-, from similislike;

=
(If you can get past the mixing of Latin and Greek roots)

sim·u·di·dactic adj.
To learn by creating a representation or model of a physical system or particular situation. Particularly, using in silico computation to understand complex systems and phenomena.

———————————————————————

This concept has been floating around in my head for a little while. I’ve written before on how I believe that simulation can be used to improve one’s understanding of just about anything, but have never had a nice shorthand for this process.

Simudidactic inquiry is the process of understanding aspects of the world by abstracting them into a computational model, then conducting experiments in this model world by changing the underlying properties and parameters. In this way, one can ask questions like:

  1. What type of observations might we make if x were true?
  2. If my model of the process is accurate, can I recapture the underlying parameters given the type of observations I can make in the real world? How often will I be wrong?
  3. Will I be able to distinguish between competing models given the observations I can make in the real world?

In addition to being able to ask these types of questions, the simudidact solidifies their understanding of the model by actually building it.

So go on, get simudidactic and learn via simulation!

simudidactic

Uncertainty matters

In a post I wrote earlier this year, I noted a sentiment expressed in The Economist about understanding and embracing uncertainty.

…recent reforms to the IPCC’s procedures will do little to change its tendency to focus on the areas where there is greater consensus, avoiding the uncertainties which, though unpalatable for scientists, are important to policy. (link)

Which I felt was contrary to the way we, as scientists, speak among ourselves about policy makers. Specifically, that it is they who fear and misunderstand the implications of uncertainty.

This is the same perception which has led to the launch today by the group Sense About Science of a publication titled Making Sense of Uncertainty: Why uncertainty is part of science.

Launching a guide to Making Sense of Uncertainty at the World Conference of Science Journalists today, researchers working in some of the most significant, cutting edge fields say that if policy makers and the public are discouraged by the existence of uncertainty, we miss out on important discussions about the development of new drugs, taking action to mitigate the impact of natural hazards, how to respond to the changing climate and to pandemic threats.

Interrogated with the question ‘But are you certain?’, they say, they have ended up sounding defensive or as though their results are not meaningful. Instead we need to embrace uncertainty, especially when trying to understand more about complex systems, and ask about operational knowledge: ‘What do we need to know to make a decision? And do we know it?’

The report seems to be in line with arguments I have made about uncertainty and decision making as they pertain to ecological research, management, and policy.

Among the contributors to the report is someone who I consider to be among the best when it comes to understanding and communicating uncertainty, David Spiegelhalter. While I haven’t made my way all the way through it yet, it looks like this report will be an informative read for both scientists and policy makers (oh ya, and journalists — can’t forget about them).

Who knows, we might be able to stop the finger pointing and work together in mutual understanding of the importance of uncertainty.

Mathematical abstraction and the robustness to assumptions

I’ve been showing my new favourite toys to just about anyone foolish enough to actually engage me in conversation. I described how my shiny new set of non-transitive dice work here, complete with a map showing all the relevant probabilities.

All was neat and tidy and wonderful until fellow ecologist, Aaron Ball, tried to burst my bubble.

Nope. I couldn’t find the error. Fortunately, he works across the hall so I just went and asked him.

The problem he found, it turns out, was not with my calculations but with my assumptions. Aaron told me that dice constructed with rounded corners and hollowed out pips for the numbers on the faces tend to be biased in the frequency at which each face rolls up. I had assumed, of course, that each side of each of the five dice would roll with the same probability (ie. 1 in 6).

As with any model of a real world system, the mathematics were carried out on a simplified abstraction of the system being modelled. There are always, by necessity, assumptions being made. The important thing is to make these assumptions as explicit as possible and, where possible, to test the robustness of the model predictions to violations of the assumptions. Implicit to my calculations of the odds of the non-transitive Grime dice was the assumption that the dice are fair.

To check the model for robustness to this assumption, we can relax it and find out if we still get the same behaviour. Specifically, we can ask here whether some sort of pip-and-rounded-corner-induced bias can lead to a change in the Grime dice non-transitive cycles.

It seems a natural place to look would be between the dice pairings which have the closest to even odds. We can find out what level of bias would be required to switch the directionality of the odds (or at least erase the tendency for one die to roll higher than the other). Lets try looking at Magenta and Red, which under the fair dice assumption have odds p(Magenta > Red)=5/9. What kind of bias will change this relationship? The odds can be evened out by either Magenta rolling ones more often, or red rolling nine more often. The question is then, how much bias would there need to in the dice in order to even out the odds between Magenta and Red?

Lets start with Red biasing toward rolling nine more often (recall that nine appears on only one face). Under the fair dice hypothesis, Red can roll nine (1/6 of the time) and win no matter what Magenta rolls, or by rolling four (5/6 of the time) and win when Magenta rolls one (1/3 of the time).

P(Red > Magenta) = 1/6 + 5/6 * 1/3, which is 4/9.

If we set this probability equal to 1/2, and replace the fraction of times that Red rolls nine with x, we can solve for the frequency needed to even the odds.

x + 5/6 * 1/3 = 1/2

x = 2/9

Meaning that the Red die would have to be biased toward rolling nine with 2/9 odds. That’s equivalent to rolling a nine 1 and 1/3 times (33%) more often than you would expect if the die were fair!

Alternatively, the other way the odds between Red and Magenta could be evened is if Magenta biased towards rolling ones more often. We can do the same kind of calculation as above to figure out how much bias would be needed.

1/6 + 5/6 * x = 1/2

x = 2/5

Which corresponds to Magenta having  a 20% bias toward rolling ones. Of course, some combination of these biases could also be possible.

I leave it to the reader to work out the other pairings, but from the Red-Magenta analysis we can see that even if the dice deviated quite a bit from the expected 1/6 probability for each side, the edge afforded to Magenta is retained. I couldn’t find any convincing  evidence for the extent of bias caused by pipping and rounded corners but it seems unlikely that it would be strong enough to change the structure of the game.

A quick guide to non-transitive Grime Dice

A very special package that I am rather excited about arrived in the mail recently. The package contained a set of 6-sided dice. These dice, however, don’t have the standard numbers one to six on their faces. Instead, they have assorted numbers between zero and nine. Here’s the exact configuration:

red<-c(4,4,4,4,4,9)
blue<-c(2,2,2,7,7,7)
olive<-c(0,5,5,5,5,5)
yellow<-c(3,3,3,3,8,8)
magenta<-c(1,1,6,6,6,6)

Aside from maybe making for a more interesting version of snakes and ladders, why the heck am I so excited about these wacky dice? To find out what makes them so interesting, lets start by just rolling one against another and seeing which one rolls the higher number. Simple enough. Lets roll Red against Blue. Until you get your own set, you can roll in silico.

That was fun. We can do it over and over again and we’ll find that Red beats Blue more often than not. So it seems like Red is a pretty good bet. Now lets try rolling Olive against Red. I’ll wait.

Hey, look at that, the mighty Red has fallen. Olive tends to roll a higher number than Red more often than it doesn’t. So far, we have discovered this relationship:

Olive > Red > Blue

All hail the dominant Olive! Out of these three dice, if we want the best chance of winning, we should always pick Olive right? No dice, as they say. When we roll Olive against Blue, we find that Blue wins more often!

For any one of these three dice, there is another that will roll a higher number more often than not.

Olive > Red > Blue > Olive > Red > Blue > Olive > Red > Blue..

This forms a chain of dominance relationships that is a closed cycle. This property is called intransivity, and you can use it to win riches beyond your wildest dreams, er, well, at least to impress your friends.

Neat, right? But there’s more! We can do the same trick with Yellow, Magenta, and Red (Red > Magenta > Yellow > Red > …). With all five dice, there is a chain for which the order is given by that length of the word for each colour.

Red > Blue > Olive > Yellow > Magenta > …

Awesome. But that’s not it, either! You may have noticed from our three way comparisons that there is another five way chain. This time, the chain order is given by the alphabetical order of the words for each of the colours.

Blue > Magenta > Olive > Red > Yellow > …

What are the odds?

So far I’ve just asked you to take my word for it that the dominance relationships are as I described. Working out the odds of winning for any given pairing of dice as actually quite straightforward. Start by looking at the number on each side of the first die, one at a time. Count how many sides on the opposing die are less than the current number and divide by six. Since each side on the first die has a 1/6 chance of appearing, divide by 6 again. Sum these values for all six sides and you will have the probability that the first die will roll a higher number than the second.

For example, P(Red > Blue) = 5/6 x 1/2 + 1/6, which is 7/12.

Here I’ve worked out all of the pairwise odds:

Grime_dice

So, you can always win in this game as long as you get to be second to choose a colour. The odds are strongest in your favour when your opponent either chooses Magenta or Red, and you choose Olive or Yellow, respectively. Isn’t probability wonderful!

And if you still want more, it turns out that if you roll the Grime dice in pairs, the order of the word length chain reverses!

As a follow up to my simulation based approximate solution to the Gambling Machine Puzzle, here is the exact solution from mathematician Michael Lugo with a nice explaination.

God plays dice

From the New York Times “Numberplay” blog:

An entrepreneur has devised a gambling machine that chooses two independent random variables x and y that are uniformly and independently distributed between 0 and 100. He plans to tell any customer the value of x and to ask him whether y > x or x > y.

If the customer guesses correctly, he is given y dollars. If x = y, he’s given y/2 dollars. And if he’s wrong about which is larger, he’s given nothing.

The entrepreneur plans to charge his customers $40 for the privilege of playing the game. Would you play?

Clearly the strategy is to guess that y > x if x is small, and to guess that y < x if x is large. Say you’re told x = 60. If you guess x is the larger variable, then conditional on your guess being correct (which…

View original post 459 more words

The Gambling Machine Puzzle

This puzzle came up in the New York Times Number Play blog. It goes like this:

An entrepreneur has devised a gambling machine that chooses two independent random variables x and y that are uniformly and independently distributed between 0 and 100. He plans to tell any customer the value of x and to ask him whether y > x or x > y.

If the customer guesses correctly, he is given y dollars. If x = y, he’s given y/2 dollars. And if he’s wrong about which is larger, he’s given nothing.

The entrepreneur plans to charge his customers $40 for the privilege of playing the game. Would you play?

I figured I’d give it a go.  Since I was feeling lazy, and already had my computer in front of me, I thought that I’d do it via simulation rather than working out the exact maths. I tried playing the game with the first strategy that came to mind. If x<50, I would choose y>x, and if x>50, I’d choose y<x, figuring I’d maximize my probability of winning something rather than nothing. This was probably due to the inherent risk aversion of system one. Let’s see how that works out:

N<-100000
x<-sample.int(100,N,replace=TRUE)
y<-sample.int(100,N,replace=TRUE)
dec_rule=50
payout<-numeric(N)
for(i in 1:N)
{
## Correct Guess (playing simple max p(!0) strategy)
if( (x[i]>dec_rule & y[i]<x[i]) | (x[i]<=dec_rule & y[i]>x[i]) )
payout[i]<-y[i]

## Incorrect Guess (playing simple max p(!0) strategy)
if( (x[i]>dec_rule & y[i]>x[i]) | (x[i]<=dec_rule & y[i]<x[i]) )
payout[i]<-0

## Tie pays out y/2
if(x[i] == y[i])
payout[i]<-y[i]/2
}
## Expected Payout ##
print(paste(dec_rule,mean(payout)))

Which leads to an expected payout of $37.75. Playing the risk averse strategy leads to an expected value less than the cost of admission, loosing on average 25 cents per play. No deal, Mr entrepreneur, I had something else in mind for my forty bucks anyway.

Lets try alternate strategies, and see if we can’t play in such a way as to improve our outlook.

## Gambling Machine Puzzle ##
## Puzzle presented in http://wordplay.blogs.nytimes.com/2013/03/04/machine/

result<-numeric(100)
for(dec_rule in 1:100)
{
N<-10000
x<-sample.int(100,N,replace=TRUE)
y<-sample.int(100,N,replace=TRUE)

payout<-numeric(N)

for(i in 1:N)
{
## Correct Guess (playing dec_rule strategy)
if( (x[i]>dec_rule & y[i]<x[i]) | (x[i]<=dec_rule & y[i]>x[i]) )
payout[i]<-y[i]

## Incorrect Guess (playing dec_rule strategy)
if( (x[i]>dec_rule & y[i]>x[i]) | (x[i]<=dec_rule & y[i]<x[i]) )
payout[i]<-0

## Tie pays out y/2
if(x[i] == y[i])
payout[i]<-y[i]/2
}

## Expected Payout ##
print(paste(dec_rule,mean(payout)))
result[dec_rule]<-mean(payout)
}
par(cex=1.5)
plot(result,xlab='Decision rule',ylab='E(payout)',pch=20)

abline(v=which.max(result))
abline(h=max(result))
abline(h=40,lty=3)

gmachine

According to which, the best case scenario is an expected payout of $40.66, or an expected net of 66 cents per bet, if you were to play the strategy of choosing y>x for any x<73 and y<x for any x>73. You’re on, Mr entrepreneur!

To calculate the exactly optimal strategy and expected payout, we would need to compute the derivative of the expected payout function with respect to the within game decision threshold. I leave this fun stuff to the reader 😉