<\/figure>\n<\/div>\n\n\nAlthough here we have specified a Beta distribution, a distribution of any kind can be used. However, in this case it is a convenient choice as it’s conjugate to the Bernoulli distribution and hence the posterior is also Beta distributed. This is why we only need to update the parameters at each time step. If the prior chosen was not conjugate, then it would not be so simple. Instead we may be in a situation where we need to specify a completely new prior each time. <\/p>\n\n\n\n
The prior chosen plays a large role in the effectivenss of the Thompson Sampling algorithm, and thus care must be chosen over its specification. Ideally it should be chosen to describe all plausible values of the reward success parameter. <\/p>\n\n\n\n
In the slot machine case, the player often has no intuition on the values of this success parameter. This is usually described using an uninformative Beta(1,1) prior as it takes uniform value across the entire [0,1] range. This promotes exploration to learn which arm is optimal. <\/p>\n\n\n\n
In the opposite case where we have some knowledge of the true values of the success parameter, we may center the priors about these values in order to reduce regret. Here, we require less exploration to identify the optimal arm and so can exploit this to maximize reward. <\/p>\n\n\n\n
Although TS tends to be efficient in minimizing regret, there are occasionally outlier cases where the regret is much higher than expected. This may happen if we begin by pulling the sub-optimal arm and receive a successful reward. We are likely to exploit this further and continue to pull this arm, falsely leading us to believe this arm is actually optimal. Or alternatively, we may begin by selecting the optimal arm and observe a failure reward. We then may exploit the sub-optimal arm under the false belief that it will give us a higher reward. <\/p>\n\n\n\n
To conclude, Thompson Sampling is a simple but efficient algorithm for solving stochastic multi-armed bandit problems, successfully balancing exploitation and exploration to maximize reward and minimize regret. <\/p>\n\n\n\n
References & Further Reading<\/h4>\n\n\n\n
Auer, P., Cesa-Bianchi, N., and Fischer, P. (2002). Finite-time analysis of the mulitarmed bandit problem. Machine Learning, <\/em>47(2):235-236.<\/p>\n\n\n\nRusso, D., Van Roy, B., Kazerouni, A., Osband, I., and Wen, Z. (2017). A tutorial on Thompson Sampling. <\/p>\n","protected":false},"excerpt":{"rendered":"
For the STOR608 course here at STOR-i we covered several research areas as part of the topic sprints as discussed…<\/p>\n","protected":false},"author":7,"featured_media":579,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[3],"tags":[13,6,15,14],"class_list":["post-188","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-blog-post","tag-multi-armed-bandits","tag-statistics","tag-stor-i","tag-thompson-sampling"],"_links":{"self":[{"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/libby-daniells\/wp-json\/wp\/v2\/posts\/188","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/libby-daniells\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/libby-daniells\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/libby-daniells\/wp-json\/wp\/v2\/users\/7"}],"replies":[{"embeddable":true,"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/libby-daniells\/wp-json\/wp\/v2\/comments?post=188"}],"version-history":[{"count":30,"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/libby-daniells\/wp-json\/wp\/v2\/posts\/188\/revisions"}],"predecessor-version":[{"id":561,"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/libby-daniells\/wp-json\/wp\/v2\/posts\/188\/revisions\/561"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/libby-daniells\/wp-json\/wp\/v2\/media\/579"}],"wp:attachment":[{"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/libby-daniells\/wp-json\/wp\/v2\/media?parent=188"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/libby-daniells\/wp-json\/wp\/v2\/categories?post=188"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.lancaster.ac.uk\/stor-i-student-sites\/libby-daniells\/wp-json\/wp\/v2\/tags?post=188"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}