Uncategorized – ROBYN GOLDSMITH /stor-i-student-sites/robyn-goldsmith PhD Student at the STOR-i Centre for Doctoral Training, ¶¶ÒõAPPµ¼º½ Fri, 30 Apr 2021 09:55:28 +0000 en-US hourly 1 https://wordpress.org/?v=6.9.4 /stor-i-student-sites/robyn-goldsmith/wp-content/uploads/sites/26/2023/11/cropped-RobynPhoto-2-1-32x32.jpg Uncategorized – ROBYN GOLDSMITH /stor-i-student-sites/robyn-goldsmith 32 32 On the Record: How to Use Gurobi to Solve Linear Programming Problems! /stor-i-student-sites/robyn-goldsmith/2021/03/20/on-the-record-how-to-use-gurobi-to-solve-linear-programming-problems/?utm_source=rss&utm_medium=rss&utm_campaign=on-the-record-how-to-use-gurobi-to-solve-linear-programming-problems Sat, 20 Mar 2021 18:42:00 +0000 /stor-i-student-sites/robyn-goldsmith/?p=568 Read more]]> I have recently learnt how to solve some mathematical programming problems using Gurobi, an optimisation solver, in Python. Before joining the STOR-i programme, I was a complete novice at programming in Python so getting to grips with it is really exciting! One of the problems Gurobi is especially useful for is a standard linear programme. In this blog post, I’m going to introduce how we can use Gurobi in Python to solve linear programming problems.

A trope of a linear programming class is the furniture factory example. No doubt, if you’ve learnt anything before about linear programming you would’ve heard of it. It concerns a factory that makes chairs and tables, with specification on the materials and working hours needed to complete production as well as the profit generated from the production of a table and a chair. The question proposed is, with given constraints on material and working hours, how many chairs and how many tables should be made in order to maximise profit? Not that I’m bored of seeing this problem (I really, really am) but I thought I could illustrate how we can use Gurobi using an alternative example, one inspired by my family’s love of music and my Dad’s very real problem of buying the right boxes to store his extensive record collection.

My family’s jukebox. This one features all the classics including my favourite song to dance to as a kid: Tutti Frutti by Little Richard.

I was brought up surrounded by music, in a house with multiple jukeboxes so my family has a lot of vinyl! They’re our pride and joy and so we want to store them safely in a way that allows us to access them easily (because let’s be honest you never know when you’ll just need to find Spice by the Spice Girls). We have done extensive research across the storage market (makes for thrilling weekend activity I must say) and we have two box contenders. The differences in the boxes are displayed in the table below:

We need to make sure that all seven-hundred of our records are boxed and my Dad wants to buy at least eight boxes so that we have enough space for the records he keeps treating himself to in order to stay sane during the lockdown. As you may have realised, storage is big business, these boxes are expensive, so we want to minimse our cost as much as we can.

Let’s solve this problem and figure out how many of each box to buy. Before you use Gurobi, you’ll have to install it and get a license, more information on that can be found . When you’ve got Gurobi working fine and dandy on your computer the first thing we want to do is set up our model. Here we’ve called our model f and added a label “Records”.

We then create our variables. These are the two different boxes we can buy. As we can’t buy a fraction of a box, we specify that our variables are integers by using vtype=GRB.INTEGER. Other variable types are listed in the Gurobi documentation .

Our objective function is to minimise the cost of buying the boxes. We set the objective of our model f by using .setObjective and include GRB.MINIMIZE to specify that our problem is a minimisation problem. For a maximisation problem, we would alternatively write GRB.MAXIMIZE. Don’t forget to spell the American way!

The constraints of our linear programming problem involve the requirement for all of our records to be placed in a box and that we order at least 8 boxes to ensure there’s room for purchases in the near future. We use .addConstr to add these considerations to our model f.

To solve the problem we use the .optimize() command. To print the variable outcomes we use .getvars().

If we run all of this code in Python we get our solution!

In order to minimise cost, we should buy two of the smaller box, Box 1, and six of the larger box, Box 2. The total cost will be £130. It’s as easy as that!

So what do you reckon? Is Gurobi a sound tool? I think so. This problem might be a simple example but it illustrates how we can use Gurobi to model and solve mathematical programming problems effectively. Be sure to check out the Gurobi tutorials in the Want to Know More section below.

Before I go, I thought I’d share my family’s vinyl collection must-haves just in case you fancy starting your own. But, fair warning, before long you might end up having to spend £130 on some storage boxes!

  • Dan Fogelber – Phoenix
  • David Bowie – Station to Station
  • HAIM – Something to Tell You
  • Buddy Holly – Buddy Holly
  • The Rolling Stones – Sticky Fingers
  • The Animals – The Animals

This week’s tweet of the week goes to with an excellent reminder to step away from the screen once in a while.

Missed my last post? Check it out here.

Want to know more?

Gurobi offer a selection of tutorials to get to grips with the software for linear programming problems. You can find it all at the link below:

]]>
Tetris, Markov Decision Processes and Approximate Dynamic Programming /stor-i-student-sites/robyn-goldsmith/2021/02/14/tetris-markov-decision-processes-and-approximate-dynamic-programming/?utm_source=rss&utm_medium=rss&utm_campaign=tetris-markov-decision-processes-and-approximate-dynamic-programming Sun, 14 Feb 2021 13:17:00 +0000 http://www.lancaster.ac.uk/stor-i-student-sites/robyn-goldsmith/?p=429 Read more]]> I recently learnt about Markov Decision Processes (MDPs). These are sequential decision processes, commonly found in the real world where decisions need to be made in an environment with uncertainty. Solving these problems, especially when they are large, is a big challenge. During my research of MDPs, I learnt these processes are pretty much everywhere, present in industries like finance, manufacturing and aviation to name just a few. Although, what I was most excited to discover was that playing a game of Tetris, believe it not, embodies this mathematical framework. As a 90’s baby, becoming freakishly addicted to Tetris felt like a rite of passage. I can remember stealing my sister’s Gameboy and desperately trying to erase her high score before she noticed. I couldn’t wait to write this post but before we dive straight in to the fun of addictive block building, let’s start with the basics.

In decision processes we can have multiple actions that produce different outcomes that have different likelihoods of being selected and resulting benefit. In these processes, a decision maker wants to make the best choices and so a solution is formed by the development of a policy which informs the decision makers on which decisions to make. An MDP is made up of states, actions, transition probabilities and rewards. A transition probability is the probability of changing from one state to another for a given action. The reward is the reinforcement given for taking a specified action. MDPs satisfy the Markovian property, so that the chance of moving from the current state depends only on the current state and not previous states or actions.

Imagine you’re in post-COVID-19 world, in a casino and sitting at a game table. In each round if you choose to play the game you will get a reward of £400 and, with probability two-thirds, the chance to play again. However, with probability one-third the game ends and you have to leave the table. You can also choose not to play and in this case you’ll get a reward of £1000 but you have to leave the table. This example forms the MDP visualised above. The states are ‘at game table‘ and ‘leave game table‘. The blue arrows represent the action ‘play‘ and the green arrow represents the action ‘don’t play‘. The rewards are either £400 or £1000 depending on the action chosen. The probability of transitioning from state ‘at game table‘ to ‘leave game table‘ is 2/3 and 1 for action ‘play‘ and ‘don’t play‘ respectively.

In Tetris, the states are represented by the configurations of the wall and the current falling piece. Actions are formed by the possible orientation and placing options of the current falling piece. The reward is the number of deleted rows and associated points awarded after the placing of the current falling piece. The terminal state of Tetris is the configuration of the wall at maximum height prompting the game to end. The objective is to maximise the number of deleted rows that generate points. Solving this MDP would aim at generating a policy that maximises the expected number of points won during play. So, is everything falling into place? (Sorry, I couldn’t resist.)

Golden Globe Winner Taron Edgerton on the set of the Tetris movie in February 2021, no doubt thinking about Markov Decision Processes.

How do we solve an MDP? Well, for simple examples like the casino game we’d look to Dynamic Programming. This class of algorithms work by breaking down the problem into sub-problems and are effective for relatively small problems. However, for MDPs with a large number of states, like playing a game of Tetris, these methods aren’t enough.

That’s where Approximate Dynamic Programming comes in. These approaches work by taking on a small part of the process and generalising this for the entire decision process. One of these methods is called Value Function Approximation. The aim is to create a function that approximates the value, a measure of how good it is to be in a state, for every state in our MDP. In Value Function Approximation with Linear Methods we approximate the value function by writing it as a linear function of features. These features give us information about the states. In our Tetris example, they tell us about configurations of the wall.

We can then estimate the value of the current state, the current configuration of the wall, by applying the knowledge of how good it is to have the configurations given by the features. We estimate the current state, the current configuration of the wall, by taking a weighted sum of the information provided by the features.

We can use a Stochastic Gradient Descent method which minimises the expected mean square error between the approximate value function and the true value function. With each iteration we update the weights using a simple update rule. We specify a step size and we define an error as the difference between the true value function and our current approximation.

This method will converge to a global optimum and therefore we will successfully get an approximation to the value function. Then we know how good a policy is! How great! Though, in real life things might not be this simple as this as we don’t know the real value function that gives us the error (if we did I’d already be killing it at Tetris). In place of the true value, we use the return, the reward generated, following each state which is an unbiased approximator of the true value. Our error is then given by the difference between this return and our current value approximation. This process is known as Monte Carlo Linear Value Function Approximation.

Many industries have a need to optimise decision making and these problem have the form of a Markov Decision Process. In most of these real world scenarios the problems are so big we need to use approximation methods. This is what makes methods like Value Function Approximation so significant. And who knows, if I had it up my sleeve back in the 90’s maybe I could’ve crushed my sister’s high score!

That’s it for now, just got time for my tweet of the week! This week’s goes to ! Check back soon for more blog posts!

Missed my last post? Check it out here.

Want to know more?

David Silver’s lecture on Value Function Approximation can be found .

M. Hadjiyiannis, 2009.. 

]]>