Q-Learning Based Energy Management Policies for a Single Sensor Node with Finite Buffer

Abstract
In this paper, we consider the problem of finding optimal energy management policies in the presence of energy harvesting sources to maximize network performance. We formulate this problem in the discounted cost Markov decision process framework and apply two reinforcement learning algorithms. Prior work obtains optimal policy in the case when the conversion function mapping energy to data transmitted is linear and provides heuristic policies in the case when the same is nonlinear. Our algorithms, however, provide optimal policies regardless of the form of the conversion function. Through simulations, our policies are seen to outperform those of in the nonlinear case.

This publication has 2 references indexed in Scilit: