MRes Year – Edward Mellor /stor-i-student-sites/edward-mellor PhD Student at STOR-i CDT, ¶¶ŇőAPPµĽş˝ Thu, 30 Apr 2020 15:07:34 +0000 en-US hourly 1 https://wordpress.org/?v=6.9.4 /stor-i-student-sites/edward-mellor/wp-content/uploads/sites/7/2021/08/cropped-cropped-EMlogo-32x32.png MRes Year – Edward Mellor /stor-i-student-sites/edward-mellor 32 32 The Patrol Problem /stor-i-student-sites/edward-mellor/2020/04/26/the-patrol-problem/ Sun, 26 Apr 2020 13:47:05 +0000 http://www.lancaster.ac.uk/stor-i-student-sites/edward-mellor/?p=205 Read more]]> In my previous post, I talked about the statistics research project that I did as part of the STOR-i program. Today I will discuss the Operational Research project I worked on with Kevin Glazebrook about Optimal Patrolling.

Consider an art gallery with several rooms. Some of these rooms are connected directly by doorways but for some pairs of rooms it may be necessary to pass through one or more intermediary rooms in order to travel between them. Each room in the gallery contains various valuable pieces of artwork. At night, when the gallery is closed, a single guard must patrol the area to prevent thievery or vandalism from instituters (attackers). The Patrol Problem is to find a patrol route that minimizes the expected cost of any damage caused by attackers.

To approach this problem we must first create a model and make some modelling assumptions.

We can use the ideas from my post on The seven bridges of Königsberg to represent the rooms of the gallery as nodes on a graph as shown in the example below:

We assume that the total value of the artwork in each room is known to both the patroller and any potential attacker. We also assume that the length of time taken to carry out an attack in any given room is random but is sampled from a known distribution.

Our patrol model assumes that the attackers arrive according to a Poisson process with a known rate and then decide which room to attack in one of the two follow ways:

  1. The target of the attack is chosen at random with known probabilities.
  2. The target of each attack is chosen strategically with the presence of a patroller in mind and the aim to maximize the total expected cost of the attacks.

The patroller is assumed to move between rooms in discrete time-steps. If the patroller interrupts an attack in progress, we assume that no damage is caused.

We need a way to tell the patroller which is the best route to take.

If the attackers choose where to attack using the randomised method we have the following:

While visiting a location the patroller either determines that no attacks are underway or apprehends the attacker. Thus, we know that immediately after a visit to a location, no attackers are present. It therefore makes sense to characterize the system by a vector containing the number of time-steps since patroller visited each room. We call this the state of the model.

If we assume that the time it takes to carry out an attack has some maximum, we can ensure the number of states is finite. This is because once we have neglected a room long enough, increasing the time since the last visit will not change the probability that an attack is ongoing.

The current room can be determined from the state as it will correspond to the entry with the lowest value. A patrol policy then tells the patroller what to do in any given state: either stay where you are or move to an adjacent room.

Since there are a finite number or states and a finite number of rooms we have a finite number of policies. An optimal policy can be found using linear programming.

If the attackers choose where to attack strategically we can create a two-person zero sum game as discussed by in my post on game theory.

In either case the optimal solution is very computationally expensive to calculate and so approximate methods are often preferred.

]]>
Introduction to Extreme Value Theory /stor-i-student-sites/edward-mellor/2020/04/17/introduction-to-extreme-value-theory/ Fri, 17 Apr 2020 13:06:46 +0000 http://www.lancaster.ac.uk/stor-i-student-sites/edward-mellor/?p=198 Read more]]> In my last post I promised an overview of my two research topics. We were encouraged to choose one topic from Statistics and the other from Operational Research. Today we will focus on the more statistical topic which I was introduced to by Emma Eastoe.

In statistics we are often interested in determining the most likely behaviour of a system. The usual way to do this would be to fit a model to the observations from the system. This can be done by finding a family of distributions that approximately describes the shape of the data. This family of distributions (or model) will have certain parameters. The observations can then be used to estimate the value of these parameters which maximises the probability of that set of observations occurring. In some situations however, the normal behavior of a system is of less concern to us and we are instead interested in the maximum (or minimum) outcome that we would expect to observe over an extended period of time. For example, if a local council is considering investment in flood defences they are not interested in the average height of the river but only in the events where the volume of water would exceed the river’s maximum capacity and cause flooding.

The problem here is that we are considering very unusual events that any distribution which was fitted to the entire set of observations would be unable to reliably estimate. We therefore require models that can be fitted to just the extreme events. There are two main approaches to consider: the Block Maxima Model and the Threshold Excess Model. Each of these approaches can by characterised by their different way of classifying an event as extreme.

  • Block Maxima Model: Here we partition the data into equal sections and then take the maximum data-point in each block to be an extreme event. The distribution of these maxima belongs to a specific family of distributions called the Generalised Extreme Value Family.
  • Threshold Excess Model: This approach considers all events that are above a certain threshold to be extreme. It can be shown that for a sufficiently high threshold these values will follow a Generalised Parito Distribution.

In both models we have an important decision to make. For the Block Maxima Model we must choose a block length and in the Threshold Excess Model we must set a threshold. These decisions play a very similar role in that they determine the number of points we have to fit our model to. If the block size is set too large or the threshold too high we will not have enough points to fit our distribution which will result in greater variance in the result. On the other hand if the block size is too small or the threshold too low the resulting points will not be well approximated by the Extreme Value or Parito distribution respectively.

Sometimes the data we are looking at is multidimensional. For example, if we want to describe extreme storm events for applications in shipping we may have data for wind and rain. These different variables may depend on each other or could be completely independent. Having more that one dimension imposes another difficulty – what do we want to consider as an extreme event. Do we need extreme values for wind and rain or is just one of the variables being extreme enough for an event to be considered extreme? Both the Block Maxima and Threshold Excess approaches can be extended to consider higher dimensions.

In my next post I will talk about my Operational Research topic: Optimal Patrolling.

]]>
Annual STOR-i conference 2020 /stor-i-student-sites/edward-mellor/2020/01/23/annual-stor-i-conference-2020/ Thu, 23 Jan 2020 14:01:58 +0000 http://www.lancaster.ac.uk/stor-i-student-sites/edward-mellor/?p=110 Read more]]> As promised in my previous blog post I will be talking today about my first experience of an academic conference.

This year STOR-i hosted its ninth annual conference, with talks from a wide variety of speakers from the UK — including some of its own PhD students and alumni — and from overseas. We listened to 12 presentations, so for the sake of brevity I will mention all of them but only go into more detail for a few.

We were welcomed to the conference by Prof. Kevin Glazebrook who spoke a bit about STOR-i’s new round of funding and introduced the first speaker: from the .

Jacquillat spoke about analytics in air transportation. In particular he discussed how we can use air traffic flow management to absorb delays upstream by holding planes on the runway so they don’t get held up in the air, expending extra fuel as they wait for their turn to land. He also spoke about the benefits of adjusting existing integer programs for scheduling so that they are optimised to minimise passenger delays. This would give greater priority to larger flights and ensuring connecting flights arrive on time.

The second talk was by third year STOR-i PhD student, Henry Moss, who introduced us to a Bayesian optimisation method called MUMBO, which he has been developing for his thesis.

Next up was from the who talked about her work with the Mallows ranking model as well as some applications and recent advances in the area.

After lunch we were given two more talks. The first by , from the , and the other by Georgia Souli, a third year STOR-i PhD student, before another break for refreshments.

We came back to a presentation titled “The Use of Shape Constraints for Modelling Time Series of Counts” by from Columbia University.

, a STOR-i alumni then talked to us about his work at using machine learning to detect fraud. One of the major problems here is that machine leaning requires data to help it learn but as the nature of fraud changes the algorithm must be able to adapt. The difficulty here is that since all the obvious fraud attempts are blocked future iterations will have no experience of them and so will have difficulties detecting them. Flowerdew suggested that allowing suspected fraudulent transactions be completed with some small probability and then proportionally increasing the weight  of these outcomes in the learning stage would allow the algorithm to learn more effectively and therefore prevent more fraud in the long run.

Tom Flowerdew at the STOR-i Conference

The final presentation of the day was: “Making random things better: Optimisation of Stochastic Systems” by from the .

We reconvened in the evening to look at posters made by the PhD students about each of their projects. This was a really good opportunity for them to develop their presentation skills by explaining their findings to knowledgeable academics in closely related fields. It was also an opportunity for us MRes students to learn a bit more about the research going on at the university and the sort of projects we might be interested in.

The following day we kicked off with a presentation by from . This focused on using the network of transactions between small and mediums sized businesses to improve credit risk models. Since transaction network data is difficult to get hold of, she also spoke about what approaches one can use without access to this data.

Next up was from who spoke about the balance between accuracy and interpretability for data science models and how this can be achieved.

Another STOR-i alumni, Ciara Pike-Burke, then talked about her recent work with multi-armed bandits. A multi-armed bandit can be thought of as a slot machine where pulling each arm will give a reward from some unknown distribution.  The usual problem is balancing exploration to learn more about the different reward distributions for each arm while also trying to maximise the total rewards by exploiting the arm that is performing best. The reward distributions are usually constant but Pike-Burke considered the case where the rewards are dependent on the previous actions of the player. For example a company can suggest different products to a customer on their website and the reward is dependent upon whether the customer follows that link. If the customer has just bought a bed they are probably less likely to buy another bed. However, that same customer might be more likely to buy new pillowcases.

Finally from presented his talk on “Model Based Clustering with Sparse Covariance Matrices.”

]]>
Every STOR-i has a beginning /stor-i-student-sites/edward-mellor/2020/01/20/every-stor-i-has-a-beginning/ Mon, 20 Jan 2020 15:00:30 +0000 http://www.lancaster.ac.uk/stor-i-student-sites/edward-mellor/?p=100 Read more]]> Hello world!

Welcome to my blog! I have just started my second term here in Lancaster so in my very first post I wanted to talk a bit about my STOR-i experience so far: both with regards to academic life but also the extra-curricular experiences STOR-i has provided.

I was officially inducted into the MRes programme in late September as part of a welcome day. The main purpose of this day was for us to meet the rest of our cohort as well as the rest of the STOR-i family. After a tour of the facilities we were each allocated a locker, a shiny new laptop and a first year PhD student as a mentor. We were also given a talk explaining some of the changes to the programme since last year which involved restructuring several of the modules which we would be starting in the following week.

Before starting these modules however we were taken on a two day team building trip to the Lake District along with the first year PhDs. During this trip we enjoyed a variety of different activities ranging from creating golf courses from upcycled materials to yacht sailing. For each of these activities we were split into different groups and so by the end of our time in the Lake District we had been able to get to know everyone pretty well.

The programme then proceeded with five weeks of lectures. Our four taught modules were:

  • Probability and Stochastic Processes
  • Inference and Modelling
  • Stochastic Simulation
  • Deterministic Optimisation

These were quite fast paced and provided a solid foundation of knowledge that we could build on throughout the rest of the term. Each of us had our own strengths and weaknesses but were able to pool our collective experiences to support each other through the process.

The next four weeks of term were taken up by a series of contemporary topic sprints. At the beginning of each week we were given a lecture introducing us to a new area within statistics or operational research. We then divided into groups and spent the next few day delving deeper into that area with the goal of reporting back at the end of the week in the form of group presentations. The four topic areas were Decision Theory, Changepoint Detection, Markov chain Monte Carlo and Stochastic Optimisation. In the final week of term we were tasked with producing an individual report on one of these topic areas. My report focused on Markov chain Monte Carlo and in particular a method which uses approximation to reduce the computational cost of an existing algorithm.

In addition to our assessed modules we also had the opportunity to learn about what the PhD students were doing. This was done both informally, by talking to them during breaks, lunchtimes and outside of working hours, but also more formally in weekly Forums where the PhDs student took turns to present their research. These talks were usually be followed either by tea, coffee and biscuits or, in the build up to Christmas, the STOR-i Bake-off.

To celebrate the end of term we were all invited out for a meal and drinks with the PhD students and members of staff.

Since arriving back for my second term I have attended the annual STOR-i conference which I will talk about in my next post.

]]>