Bayesian – Rebecca Hamm /stor-i-student-sites/rebecca-hamm MRes Student at STOR-i Centre for Doctoral Training Thu, 29 Apr 2021 14:10:55 +0000 en-US hourly 1 https://wordpress.org/?v=6.9.4 /stor-i-student-sites/rebecca-hamm/wp-content/uploads/sites/29/2021/01/cropped-logo-4-32x32.png Bayesian – Rebecca Hamm /stor-i-student-sites/rebecca-hamm 32 32 This Week on the STOR-i Programme: Bayesian Optimization /stor-i-student-sites/rebecca-hamm/2021/04/18/this-week-on-the-stor-i-programme-bayesian-optimization/?utm_source=rss&utm_medium=rss&utm_campaign=this-week-on-the-stor-i-programme-bayesian-optimization Sun, 18 Apr 2021 09:38:25 +0000 /stor-i-student-sites/rebecca-hamm/?p=360 At STOR-i we are currently looking through our potential PhD projects. For each project we have been given a page summarizing the topic with some papers listed at the bottom. As I was looking through these papers I came across one which particularly interested me in an area I knew very little about. I decided a good way for me to form a deep understanding of this paper would be to write a blog post on it. This way you can learn something too.

The paper in question is called . So you may remember in a previous blog post I explored a Bayesian approach to a multi armed bandit problem and in another I looked at a heuristic approach to an optimization problem. Well today (or which ever day you decide to read this) we are looking at a Bayesian approach to an optimization problem.

So lets outline the situation. We have a function f(x). We would like to find the maximum of this function, however we do not know the structure of the function, and it is expensive to evaluate the function at certain points. Basically we have a black box which we give inputs (our x values) and receive an output (f(x) value). Since evaluating points of the function is expensive we can only look at a limited number of points in the function. So we have to decide which points to evaluate in order to find the maximum.

So how do we do this? Firstly we fit a Bayesian model to our function. We can then use this to formulate something called an acquisition function. The highest value of the acquisition function is the point we evaluate next.

is used to fit the Bayesian model. We suppose that f values of some x points are drawn at random from some prior probability distribution. This prior is taken by the as a multivariate normal with a particular mean and covariance matrix. A mean vector is constructed using a mean function at each x. The covariance is constructed using a kernel which is formulated so two x’s close together have a large positive correlation. This is due to the belief closer x’s will have similar function values. The posterior mean is then an average of the prior mean and an estimation made by the data with a weight dependent on the kernel. The posterior variance is just the prior covariance of that point minus a term that corresponds to the variance removed by observed values. The posterior distribution is a multivariate normal.

Taken from the paper

Illustrated above we have an estimate of the function f(x) (solid line) and the dashed lines show Bayesian credible intervals (these are similar to confidence intervals) with observed points in blue. Using this we can form an acquisition function such as:

Taken from the paper

This function tells us which point of the equation to evaluate next. This is the maximum point of the function. There will be a balance between choosing points where we believe the global optimum and places with large amounts of variance.

There are many different types of acquisition functions. The most commonly used one is known as the expected improvement. In this case we assume we can only provide one solution as the maximum of f(x). So if we had no more evaluations left we would provide the largest point we have evaluated. However, if we did have just one more evaluation, our solution would remain the same if the new evaluation was no larger than the largest so far, but if the new evaluation is larger that would now be our solution. The improvement of the solution is the new evaluation minus the previous maximum. If this value is positive and zero otherwise. So when we choose our next evaluation we would like to choose one which maximizes improvement. It is not quite that simple as by the nature of the problem we do not know the value of an evaluation until we have chosen and evaluated that point. This is when we use a Bayesian model as we can use it to obtain the expected improvement of the points and then choose the point that has the largest expected improvement. This will choose points with high standard deviation and points with large means in the posterior distribution. This means there will be a balance between choosing points with high promise and points with large amounts of uncertainty.

This method can be applied to many problems including robotics, experimental particle physics and material design. This paper explains the application of Bayesian optimization to the development of pharmaceutical products. To learn more I’d advice reading the I used for this blog as well as this which discusses constrained Bayesian optimization and its applications.

]]>
This Week on the STOR-i Programme: Fast Fashion and Multi-armed Bandits /stor-i-student-sites/rebecca-hamm/2021/02/21/this-week-at-stor-i-fast-fashion-and-multi-armed-bandits/?utm_source=rss&utm_medium=rss&utm_campaign=this-week-at-stor-i-fast-fashion-and-multi-armed-bandits Sun, 21 Feb 2021 19:37:36 +0000 http://www.lancaster.ac.uk/stor-i-student-sites/rebecca-hamm/?p=288 This week as part of the MRes course we had to pick our next topic to write a report on. I was really stuck between two options but in the end had to chose one. I thought if I’m not going to write a report on the other option I can at least write a blog on it and here we are. So as you may or may not have guessed from the title of this post the option I didn’t chose was: Multi-armed bandits. At the end of the talk on this area the lecturer, Kevin Glazebrook, mentioned some areas of particular study. One that particularly caught my eye was fast fashion. Before deciding to do a Maths degree I wanted to be a fashion designer or just any job in fashion really. While I gave up that dream for my love of Maths, it is still an interest of mine. Hence, I was very excited by the combination of the two areas.

Previously clothing companies would have to make decisions on what products that were selling that season with very little information on where demand may lie that season. As you can imagine this leads to them missing opportunities of selling popular goods as well as having excess supply of unwanted products. As technology has improved, especially manufacturing schemes and means of transport, companies have been able to delay some of the production for that season. This means they will have more information on what’s in demand that season and then produce and sell goods accordingly.

Now you may be thinking that’s very nice but what’s that got to do with maths. Well I’ll tell you. I assume you remember one of the focuses of this blog was . If you are picturing something like this picture from this, do not worry as in this case we are talking about problems known as multi-armed bandits. In these problems we have a series of time steps and at each time step we have to make a decision (pull an arm). Before we pull an arm we are unsure if it will help us achieve what ever it is we wish to achieve, but by pulling that arm we will know more information about the arm. The aim is to minimize the regret we have for pulling an arm. To do this we have to balance between two things: exploitation and exploration. So we want to exploit any information we have from pulling arms previously in order to pull arms which give us successful results. However, we want to explore all our options to ensure we have found the best arm.

So how does this relate to fast fashion? Well if the company delays production so that they release a new selection of goods at T time steps. Then at each time step, t, they have to chose which products they will release in this selection. Picking a product to go in a selection is pulling an arm. By picking a product they can then see how well it sells and hence, its demand. They can then use this information to help make their decision at the next time step. To ensure that we are making the best decisions with the information we have at each time step we make a model.

So lets look at this model. To start with we have a set of S different products to chose from. As there is limited space within a shop we can only chose N of these products at each t. For this model we assume that a customer will buy one unit of a product at an unknown constant rate ds. This is assumed to remain constant but the actual demand for the product will only be observed at times when the product is in the selection. To formulate this model we use some Bayesian statistics. If you are not clued up on Bayesian statistics I suggest you take a peak . In Bayesian statistics we can incorporate prior beliefs or information on parameter of our model. In this case our prior beliefs are represented as a with a shape parameter ms and scale parameter as. Both are assumed to be positive and ms assumed to be an integer. We are using a for a on any samples of data we have at a given time. As the Gamma distribution is a , our resulting distribution (posterior) is a Gamma distribution with shape parameter (ms+ns) and scale parameter (as+1), where ns is the number of products, s, sold in a selection period. So, each time a product is selected its posterior distribution will be updated with the addition of ns sales for that selection period to the shape parameter and and 1 to its scale parameter. The intuition is that the shape parameter is the units of products that will be sold in a number of periods equal to the scale parameter so the expected number of sales from a product in a period is the shape parameter divided by the scale parameter. This can be used to make decisions by choosing options with the largest expected sales. This model balances exploration and exploitation. If a product has a lot of sales ns will be larger so will the shape parameter and hence the expectation will be as well. This means that product will likely be picked again however, the more times a product is picked the larger the scale parameter will get. A larger scale parameter means the expectation will be reduced hence, lowering the chances options picked frequently and increasing opportunities for exploring other options.

If we simplify the problem so that we have to chose one of a pair of shorts, a skirt or a skort at each time step as choices may go something like this:

The starting scale parameters are 16,17 and 13 for shorts, skirt and skort respectively and the shape parameter is 1 for all .

To read the paper that formulated this model as well as learn more about how maths is used to learn about demand within fast fashion click . I hope this blog post gave you a little insight in to maths being used in our everyday lives which you may not have previously thought about.

]]>