Paper ID | MLSP-25.4 |
Paper Title |
DOUBLE-LINEAR THOMPSON SAMPLING FOR CONTEXT-ATTENTIVE BANDITS |
Authors |
Djallel Bouneffouf, IBM Research, United States; Raphael Feraud, Orange, France; Sohini Upadhyay, IBM Research, United States; Yasaman Khazaeni, Irina Rish, Universite de montreal, United States |
Session | MLSP-25: Reinforcement Learning 1 |
Location | Gather.Town |
Session Time: | Thursday, 10 June, 13:00 - 13:45 |
Presentation Time: | Thursday, 10 June, 13:00 - 13:45 |
Presentation |
Poster
|
Topic |
Machine Learning for Signal Processing: [MLR-REI] Reinforcement learning |
IEEE Xplore Open Preview |
Click here to view in IEEE Xplore |
Virtual Presentation |
Click here to watch in the Virtual Conference |
Abstract |
In this paper, we analyze and extend an online learning framework known as Context-Attentive Bandit , motivated by various practical applications, from medical diagnosis to dialog systems, where due to observation costs only a small subset of a potentially large number of context variables can be observed at each iteration; however, the agent has a freedom to choose which variables to observe. We derive a novel algorithm, called Context-Attentive Thompson Sampling (CATS) , which builds upon the Linear Thompson Sampling approach, adapting it to Context-Attentive Bandit setting. We provide a theoretical regret analysis and an extensive empirical evaluation demonstrating advantages of the proposed approach over several baseline methods on a variety of real-life datasets. |