(EE Talk)Thompson Sampling for Cascading Bandits

Poster:Post date:2019-01-14
Speaker: Prof. Vincent Y. F. Tan

Time: 2019-01-23 14:00~15:00

Location: BL-112

Summary: Abstract: We introduce the canonical multi-armed bandit (MAB) problem, along with a classical algorithm that solves it efficiently--Thompson sampling. We then discuss a variant to the standard MAB problem, termed cascading bandits, which is a useful model for online recommendations and advertising. We adapt Thompson sampling to the cascading bandits problem and derive a regret bound of the form $\\tilde{O}(\\sqrt{KLT})$ where $L$ and $K$ are the number of ground items and the number of items in the chosen list respectively and $T\\ge L$ is the number of Thompson sampling update steps. This matches the state-of-the-art regret bounds for UCB-based algorithms. Empirical experiments demonstrate superiority of TS-Cascade compared to existing UCB-based procedures in terms of the regret and the time complexity. 

This is joint work with Wang-Chi Cheung (IHPC, A*STAR) and Zixin Zhong (NUS) and the details can be found at https://arxiv.org/abs/1810.01187.

Biography: Vincent Y. F. Tan is currently a Dean’s Chair Associate Professor in the Department of Electrical and Computer Engineering and the Department of Mathematics at the National University of Singapore (NUS). He received the B.A. and M.Eng. degrees in Electrical and Information Sciences from Cambridge University in 2005 and the Ph.D. degree in Electrical Engineering and Computer Science (EECS) from the Massachusetts Institute of Technology (MIT) in 2011. His research interests include information theory, machine learning, and statistical signal processing. Dr. Tan is currently an IEEE Information Theory Society Distinguished Lecturer.
Last modification time:2019-01-14 PM 5:15

cron web_use_log