Preprint
Abstract
We consider an opportunistic communication system consisting of multiple independent channels with time-varying states. With limited sensing, a user can only sense and access a subset of channels and accrue rewards determined by the states of the sensed channels. We formulate the problem of optimal sequential channel selection as a restless multi-armed bandit process. We establish the indexability and obtain Whittle's index in closed-form for both discounted and average reward criteria. These results lead to the direct implementation of Whittle's index policy with remarkably low complexity. When channels are stochastically identical, we show that Whittle's index policy is optimal under certain conditions. Furthermore, it has a semi-universal structure that obviates the need to know the channel transition probabilities. The optimality and the semi-universal structure result from the equivalency between Whittle's index policy and the myopic policy established in this work. For non-identical channels, we develop efficient algorithms for computing a performance upper bound resulting from Lagrangian relaxation. The tightness of the upper bound and the near-optimal performance of Whittle's index policy are illustrated with simulation examples.