mathematics – Ben Lowery @ STOR-i

The Monty Hall problem and its generalisations: Part 2

ben-lowery — Fri, 01 Apr 2022 13:01:44 +0000

In the previous blog post we looked at the infamous Monty Hall problem and its controversial (but correct) solution. The main problem has been talk of countless , , and ; providing a nice introduction to probability and Bayes theorem. And while it is fun to rehash the same story, it might be worth looking at how to broaden the problem and see how the same core principles can be applied to a more obtuse setting. With this we can explore some of these re-formulations, starting with expanding the number of doors in our game.

Monty Hall: Live from the !

Let’s envision the following fictitious scenario; in which after the success of his three door final showdown, Monty and his team have been gifted a bigger budget to make a more elaborate show, and a new, possibly infinitely big studio. Here, Monty utilises this increased budget to order his producers to purchase more doors. Now that he possesses a studio filled to the brim of disused doors, Monty again places one car behind a door and keeps count of where it is placed. While a flood of goats trundle in and hide behind the rest, he asks a contestant, now slightly more intimidated than their predecessors (see below gif), to pick a door. The contestant hesitantly chooses, giving way to our host opening every door but the contestants and one final door.

A contestant on the reboot of let’s make a deal who’s confidence clearly indicates that they read the last blog post on how to win.

Now we again pose the question, stick or switch? Given the information we attained from the original Monty Hall situation, it makes sense to have an intuitive guess that switching will be in the contestants best interest. And we can test this again by using Bayes’ Theorem and some basic Probability theory.

Like in the original incarnation, we define events and variables for the new version. Instead of 3 doors, we now possess $d$ doors. Each of these doors are assigned the event it may possess the prize behind it, we can define these events as $D_1,...,D_d$ . While also allowing $G$ to be the event we open all but doors 1 and $d$ to reveal a goat. So with this in mind, we can formulate the following Bayes equation for any door in particular, say $i$ :

$\mathbb{P}[D_i|G]=\frac{\mathbb{P}[D_i]\mathbb{P}[G|D_i]}{\mathbb{P}[G]}.$

The individual probabilities for the right hand side are calculated as:

$\mathbb{P}[D_1]=...=\mathbb{P}[D_d]=1/d \\ \mathbb{P}[G|D_1]=\frac{1}{d-1} \\ \textrm{(As we are just restricted to opening every other door if the prize is here)} \\ \mathbb{P}[G|D_d]=1 \\ \textrm{(As we are just restricted to opening every other door if the prize is here)} \\ \mathbb{P}[G|D_2]=...=\mathbb{P}[G|D_{d-1}]=0 \\ \textrm{(If the prize lies in all these doors we want to open, we clearly can’t open them).}$

Since $D_1$ to $D_d$ cannot occur simultaneously, then we can use some more simple statistical properties, specifically , to express the probability of a goat behind all but doors 1 and $d$ as follows in this slightly long, but hopefully intuitive derivation:

$\mathbb{P}[G]=\sum_{i=1}^d \mathbb{P}[D_i]\mathbb{P}[G|D_i]\\= \mathbb{P}[D_1]\mathbb{P}[G|D_1]+ \mathbb{P}[D_d]\mathbb{P}[G|D_d]+\sum_{i=2}^{d-1}\mathbb{P}[D_i]\mathbb{P}[G|D_i]\\ =\frac{1}{d}\cdot \frac{1}{d-1}+\frac{1}{d}\cdot 1=\frac{1}{d}\left(\frac{1}{d-1}+1\right)=\frac{1}{d}\left(\frac{d}{d-1}\right).$

Remember we opened all but door 1 and $d$ , so all doors in-between will have a probability 0 of having the car behind it. Thus, we substitute the above derivations back into Bayes Theorem equations, but only for doors 1 and $d$ are,

$\mathbb{P}[D_1|G]=\frac{1/d\cdot (1/d-1)}{1/d\cdot d/(d-1)}=\frac{1}{d} \\ \mathbb{P}[D_d|G]=\frac{1/d\cdot 1}{1/d\cdot d/(d-1)}=\frac{d-1}{d}$

While tedious, this derivation is pivotal in allowing a generalisation of the problem. Generalisations are crucial in mathematics, allowing us to expand our problem from an initial set of constrained numbers (like only 3 available doors) to as many doors as we want, and all we need to do is plug that number into $d$ .

To test this – and provide a little sanity check – we can hark back to our original Monty Hall Problem and seeing that substituting 3 doors gives us probabilities of switching and staying as 2/3 and 1/3 respectively. So now we have a rather straightforward method to show, no matter how many doors we have, if we open all but one door and the original door, it is in our best interest to switch. In addition to this, although trivial to point out, we see that with more doors, our likelihood of winning when switching increases. For example, with $d=7$ doors, we should, by plugging our values in, attain the staying probability (i.e. door 1) as 1/7, and switching (to door $d$ ) as 6/7.

To see this in practice let’s run some simulations for 3, 5, 7 and 9 doors after carrying out Monty Hall’s deal 3000 times.

We see the winning chance from switching increases as the doors increase

which is pretty good.

Winning isn’t everything

Our second generalisation into Monty Hall’s problem is one in which we are looking to try flip the odds back into the favour of the host. Given the generosity of winning when we expand the problem to $d$ doors, it is now worth seeing if limiting the number of doors Monty opens can make it more – or less – likely for the contestant to win when switching. If we think about this in a logical manner, it should be the case that now we have the option of which door we can switch to, we are less likely to get the prize than in the previous scenario if we switch. But it is worth calculating how much of a detriment this new rule is to our contestant. And analysing how drastically our odds can change by opening less and less doors.

This time we will take a scheduled commercial break ~~out of laziness~~ to relieve ourselves of any more probability equations and focus solely on numerical computations. Consider an example where, given $d$ doors, we open $k$ of these. More specifically let’s look at the case of having 10 doors and we open a subset of these, analysing the number of times we win if we switch, we win if we stay, and the new third case that we don’t win if we stayed or switched. This is seen in the following graph.

As we can see in the choice between 10 doors, when opening 8 (which is the max number of doors we can open) and opening 6 (in which we have a choice as to what we can switch to), there exists a significant dip in the probability of winning when switching, with it then being more likely to not win the game whatever we do. This is due to the truly random choice we now have with the selection of the door we might want to switch to. Despite all these changes, the chance of switching consistently gives better odds than staying. We can see this in the generalisation of opening $k$ from a set of $d$ doors. This is given as the following equation:

$\mathbb{P}[\textrm{Winning when switching}]=\left(\frac{d-1}{d}\right)\cdot\left(\frac{1}{d-k-1}\right)$

For those in dire need of a Bayesian derivation (as I normally am), one can refer . As a little test, we can take $d=10$ and $k=2$ and see that the probability of winning when switching as $\approx 0.129$ , which leaves our simulation pretty damn close to what we want.

And with this we can conclude our investigation into suspected goat farmer Monty Hall and his mystery doors. But was this investigation as concrete as the numbers suggest?

Statistical stage fright

In these two blogs we’ve seen how applying some mathematical rigour allows us to understand, dissect and create advantages in a game of seemingly random luck. With this being said, as often is the case of applying Mathematics to the real world, our logical reasoning may still not be perfect, nor reveal the true solution to the problem. since arguments could be made in that randomising the choices of contestants in the simulation and using conditional probability detracts from the human element in the game. That in which the host, the atmosphere and the audience play a crucial role in the dilemma posed to the contestant, perhaps leading to a bias in the options available. This is something that probability and random simulations simply cannot account for. Hence it could even be contested that in reality, based on the host’s hints and approach towards the contestant, the probability of finding the winning car can range from 1/2 to 1. A paper on this dilemma of the human element can be see .

Source

Linear Programming and the birth of the Simplex Algorithm

ben-lowery — Fri, 11 Mar 2022 16:27:38 +0000

Historical insights into the birth of a crucial subfield of Operational Research.

“It happened because during my first year at Berkeley I arrived late one day at one of Neyman’s classes.”
–

George B. Dantzig, a first year doctoral student at UC Berkeley, mistook a set of unsolved problems in statistics for a homework question. Scribbling them down and solving them over the next few days, Dantzig had found these problems slightly harder than a normal homework assignment. He threw his solutions onto the desk of his professor; expecting them to get lost within the clutter of Neyman’s office.

“��APP�� six weeks later, one Sunday morning about eight o’clock, [my wife] Anne and I were awakened by someone banging on our front door. It was Neyman.” . What the young student had done, initially unbeknownst to him, was solve these statistical problems and had a giddy professor already writing his papers introduction to be sent for publication. From this, Dantzig begun a journey into mathematical stardom.

Eight years after this tale, forever ingrained into the minds of wannabe mathematicians, Dantzig was working as a mathematical advisor for the pentagon. Tasked by his department to computationally speed up logistical issues faced by the US Air Force, he developed techniques stemming from the infant field of Linear Programming to optimise said issues. The method used was to be known as the Simplex method, but where does it come from and who are the significant players in Linear Programming?

Two more key figures

The ideas of Linear Programming in the history of mathematics often starts with Dantzig’s contributions, but its origins can be dated back to a few years earlier during World War II. Namely in the field of economics and with Soviet economist In 1939, he developed the first forms of the Linear Programming problem for organising and planning production. Cited as a founder of the field, Kantorovic’s method revolved around finding dual variables and corresponding primal solution, linking how the results from one directly impact the other. The ideas of Primal and Dual simplex are key components of linear programs, however they consist in a slightly adapted form than what Kantorovic designed. They are not explicitly covered in this post, but the eager can venture to see more on the topic. Kantorovic would later go on to win the for his work in resource allocation stemming from ideas he developed in the operations research field.

Three big dogs of Linear Programming

Another important character to this field, amongst many others, was . He was a proverbial rockstar in the mathematical world, dabbling in a variety of topics from quantum mechanics to game theory to the early days of computer science and most importantly to us, Linear Programming. He contributed an important aspect to this field, Duality Theory, involving the ideas of Primal and Dual Linear Programs recently touched upon. Without explicit knowledge of what this entails, it should be important to understand that this concept is pivotal to expanding and solving more complicated Linear Programs and highlighting a connection between optimisation by maximising or minimising.

Solving Linear Programs

While these aforementioned figures crafted a field in which decisions of optimality can be expressed simply as a set of linear inequalities, a key issue still remains… how do we solve these?

Recall Dantzig’s work with the Pentagon, his eventual solution to conundrums regarding optimal solutions for planning methods didn’t arise as quickly as his infamous Berkeley story (it might be of interest to know links between this story and inspiration for the movie Good Will Hunting is ). Instead, the acclaimed Simplex Algorithm was the result of an evolution from his PhD work six years prior. Here, Dantzig developed an algorithm that could solve sets of linear inequalities, with the aim of maximising or minimising some objective.

A quick example of what this objective could be is, for example, thinking of a fruit seller, figuring out how to maximise profit, with different fruits having certain purchase restraints, cost requirements, life expectancy etc.. While these problems may not have been at the upmost important to Dantzig at the time, these motivating ideas at least warrant some kind of solution that is optimal. His Simplex Method of 1947 did just that and, at the , had an incredible track record for being an effective method.

A way too quick gif showing how the objective function (purple line) speeding towards it’s optimal solution. Simplex works by navigating on the vertices of a feasible region which contains solutions that exist.

Decades later, where it is commonplace for new methods to come in and improve on the old, finding unique and novel ways to push the boundaries, the Simplex is still regarded as a strong contender for the best method at solving Linear Programs. With still employing the 75 year old method.

This blog covered a purely historical aspect of the method and some further reading can be found in these interviews and overviews of Dantzig:

A more wider history of linear programming, and its wider family of Operational Research can be uncovered in the following compendium of resources:

Source

Stochastic Simulation of Diseases: Introduction and models

ben-lowery — Fri, 28 Jan 2022 20:10:23 +0000

“I simply wish that, in a matter which so closely concerns the well-being of the human race, no decision shall be made without all the knowledge which a little analysis and calculation can provide”
Daniel Bernoulli, 1760

A brief history of Infectious Disease Modelling

In 1766, Swiss Mathematician Daniel Bernoulli published an article in the French literary magazine Mercure de France concerning the effect smallpox had on life expectancy and the improvements which could be made with the introduction of inoculation. Bernoulli’s concluding argument, from which the aforementioned quote is derived, led to the creation of some of the first epidemiological models.

This does not mean that the paper by Bernoulli was immediately lauded by his peers. Jean le Rond d’Alembert was a prominent mathematician and intellectual at the time; albeit slightly behind the curve in probability theory, publishing in “Croix ou Pile” his rationale behind . D’Alembert clashed with Bernoulli on the issue, authoring a rebuttal in a 1760 paper “On the application of probability theory to the inoculation of smallpox”.

Bernoulli had initially communicated his work in a 1760 presentation in Paris. However, issues caused the paper to not be released until 1766, giving d’Alembert a head start in his critique. He argued against some of the assumptions Bernoulli had made with respect to the probability of infections and the independence between age and dying of smallpox. D’Alembert’s alternative formulation also resembles modern modelling and, despite the differing opinions, both agreed that inoculation of the population was the way forward.

Bernoulli’s submission is often regarded as the foundation to what would eventually become , although the field as it is known today did not further develop until the early 20th century. This arrived in the form of work by the polymath , who wrote on malaria prevention by crafting a model of . More generalised models were also produced by William Kermack and Anderson McKendrick , their SIR model provided early forms of compartmental modelling; a class of models that form the cornerstone of much subsequent research. These models took a where no randomness is involved in the system and the output will always be replicated if given the same set of initial parameters.

What we are more interested in are another class of compartmental models that arrived not too long after deterministic models: stochastic models. These types of models included the effects of randomness commonly found in real-life scenarios. There has been many approaches to these, each applicable to certain areas. We consider a few of these models in this blog post, all of which revolve around compartmentalising the population into key components: Susceptible, Infected, and Recovered individuals. The idea being you leave one state and filter into the next. Thus, under these basic assumptions, recovery can also mean recovering 6 feet under in a casket. But let’s be positive and imagine all contractors of the hypothetical disease we discuss recover to live long and fruitful lives.

Chain Binomial Models

We can try garner understanding of stochastic models through the introduction of a simple, probability based method in chain binomials.

These models are discrete time (updates happen in an incremental step) and see where each fraction of the population is at the next time step. Some general yet subsequently quite restrictive assumptions are placed on the model. Namely:

The population is fixed.
Disease will always transfer whenever contact is made.
Contacts are independent (two people cannot infect one individual).
Infected people recover one time step after infection.

Obviously in reality this is not likely to hold on large scale populations, however in small, enclosed environments such as Hospitals, Schools and Households, the use for this model becomes more relevant.

So how does one model the movements between compartments? Say we infect $I_t$ people at time $t$ , then at the next time point we infect $I_{t+1}$ people. These newly infected individuals will now leave the susceptible state by our assumption (2), and then by assumption (4), we must put the infectious people from the previous time step $t$ into the recovered population. Mathematically we express this as:

$S_{t+1}=S_t-I_{t+1}$

$R_{t+1}=R_t+I_{t}$

Now the most pertinent question is how do we govern the number of infections? You can perform some pretty elementary maths to arrive at the binomial distribution (). From a pool of $S_t$ susceptible individuals, we find the probability of infecting $x$ people given the probability of not infecting anyone is $q$ . You can see why we phrase this last part in such a counterintuitive way shortly.

Deterministic, or fixed updates, can be done by taking the expectation to find infections. Conversely, if you want to incorporate an air of chaos and randomness (and you actually read the title of this post), updates can also be done stochastically through a set of .

The last step of the model is how to determine the average number of infections one infectious person is expected to give in this set up, or more commonly referred to as the basic reproduction number ( $\mathcal{R}_{0}$ ). Say we have a total population of $N$ , then,

$\mathcal{R}_0=(N-1)(1-q).$

The value $q$ , which is used as a parameter in the binomial distribution to inform on the number of infections, can essentially be recovered from this reproduction number. To see chain binomial in action we can simulate an epidemic of $N=500$ people and a $\mathcal{R}_0=1.5$ .

Given the stochastic nature, it may be best to run multiple simulations as to not end up with potential anomalous results informing us incorrectly on what is likely to happen. Here, 7 iterations are chosen.

Simulation of the chain binomial model… less said about this the better

Here, given our large initial number of people, it can be seen that the chain binomial model dies out quickly with large populations and the discrete time steps lead to chunky graphs where conclusions are hard to be drawn. Therefore, this model is fairly weak and outdated to the task at hand, and it should come at no surprise that there are models that do perform better with larger populations, and a reaction based approach is considered next.

Gillespie’s Algorithm

Despite being initially formulated by the much cooler named Joseph Doob, the method was presented to the public forum by Dan Gillespie in 1976 an showcased a stochastic method to simulate the time evolution of a chemical system, through chemical reactions. This method can be ~~stolen~~ borrowed and applied to the epidemic setting through a re-evaluation of what these reactions can represent.

We can think of the compartments we have defined earlier and how the movement between them can be thought of as reactions between states. More specifically, two ‘reactions’ take place, an infection and a recovery. The former being a combination of a susceptible and infectious individual, and the latter being an infection contacting a recovery; which in this sense can be thought of as perhaps a healing ailment. For the algorithm itself, this offers a neat overview of what’s at play and includes an epidemiological example.

An interesting feature of this algorithm is its use of methods to dictate what reaction takes place, and its update states in a fraction of a time. This leads to both the stochastic nature we are looking for in these methods, as well as a more advantageous update strategy when compared to the often large discrete time updates with the previous chain binomial model.

It should be noted that with fixed fractional time updates, this method can be computationally expensive to implement, so a way to calculate more efficient time steps using is preferred. An example using the same set up of 500 infectious individuals, but with a slightly higher reproduction number of $\mathcal{R}_0=3$ , to compare to the chain binomial is seen below:

A slightly better, if not still very variable simulation of an epidemic.

Here, red, blue and green represent susceptible, infected and recovered individuals respectively, and the thick black lines give the trajectory of the epidemic if no randomness was involved.

You can see the different trajectories (included for the same reason as the chain binomial to accompany for the effects of randomness and give us a fuller picture) vary greatly around the time of most infections, but all balance out towards the end. This method of evaluation is fairly good at epidemics on large scales, and the randomness feeds into why we’d want to use stochastic models in the first place…

Takeaways and further investigations

The use of stochastic models might not be immediately obvious, from a naive point of view why would we want models that have the potential to deviate from what we expect to happen with fixed, deterministic models? In an idealised scenario where we play god, and have a clear view of how something will progress, then models that don’t factor in randomness is a clear and obvious avenue. However, this is never the case and has been seen extensively within the last few years that there is no way to truly model and predict the behaviour of humans and the rationale behind their decisions. So modelling contacts between groups and adding unpredictability in how it might happen show, even on a small simplified scale, the range of possibilities that could still potentially happen. Where deterministic models give an idea on what should happen given a set of assumptions, stochastic models provides what could if these fail or are effected by chance.

This blog just scratches the surface of possible avenues in stochastic modelling of epidemics. Another big idea in this field is Stochastic Differential Equations based approaches, which uses ideas from Stochastic calculus and financial mathematics, a potential future blog post in the making. More on this, simulation of methods, sensitivity analysis and a use case relating to malaria, can be found in my MSc dissertation .

References

Linda J. S Allen is a rockstar of stochastic mathematical epidemiology, . She has produced the following concise tutorials, as well as length books on areas of stochastic epidemic modelling:

An Introduction to Stochastic Processes with Applications to Biology, CRC Press:
An Introduction to Stochastic Epidemic Models, Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 81–130.
“A primer on stochastic epidemic models: Formulation, numerical simulation, and analysis”, Infect Dis Model 2(2), 128–142.

For more on the history of epidemic models, and directions to the plethora of literature on the matter, the review article : “How mathematical epidemiology became a field of biology: a commentary on anderson and may (1981) ’the population dynamics of microparasites and their invertebrate hosts” is an excellent read and is open access from here: .

Source