Improved Regret Bounds for Bandits with Expert Advice

Nicolò Cesa-Bianchi; Khaled Eldowa; Emmanuel Esposito; Julia Olkhovskaya

doi:10.1613/jair.1.16738

PDF

Published: Jul 5, 2025

DOI: https://doi.org/10.1613/jair.1.16738

Keywords:

multi-armed bandits

Nicolò Cesa-Bianchi

Khaled Eldowa

a:1:{s:5:"en_US";s:19:"University of Milan";}

Emmanuel Esposito

Julia Olkhovskaya

Abstract

In this research note, we revisit the bandits with expert advice problem. Under a restricted feedback model, we prove a lower bound of order [KT ln(N/K)]^1/2 for the worst-case regret, where K is the number of actions, N > K the number of experts, and T the time horizon. This matches a previously known upper bound of the same order and improves upon the best available lower bound of [KT ln(N)/ln(K)]^1/2. For the standard feedback model, we prove a new instance-based upper bound that depends on the agreement between the experts and provides a logarithmic improvement compared to prior results.

Issue

Vol. 83 (2025)

Section

Articles

Article Sidebar

Main Article Content

Abstract

Article Details