Kernel-based Methods for Bandit Convex Optimization

Abstract
We consider the adversarial convex bandit problem and we build the first poly( T )-time algorithm with poly( n ) √ T -regret for this problem. To do so, we introduce three new ideas in the derivative-free optimization literature: (i) kernel methods, (ii) a generalization of Bernoulli convolutions, and (iii) a new annealing schedule for exponential weights (with increasing learning rate). The basic version of our algorithm achieves Õ( n 9.5T )-regret, and we show that a simple variant of this algorithm can be run in poly( n log ( T ))-time per step (for polytopes with polynomially many constraints) at the cost of an additional poly( n ) T o(1) factor in the regret. These results improve upon the Õ( n 11T -regret and exp (poly( T ))-time result of the first two authors and the log ( T ) poly( n ) T -regret and log( T ) poly( n ) -time result of Hazan and Li. Furthermore, we conjecture that another variant of the algorithm could achieve Õ( n 1.5T )-regret, and moreover that this regret is unimprovable (the current best lower bound being Ω ( nT ) and it is achieved with linear functions). For the simpler situation of zeroth order stochastic convex optimization this corresponds to the conjecture that the optimal query complexity is of order n 3 / ɛ 2 .
Funding Information
  • Israel Science Foundation (715/16)
  • European Research Council Starting Grant (803084/5)

This publication has 11 references indexed in Scilit: