Intelligent dynamic control policies for serial production lines

Abstract
Heuristic production control policies such as CONWIP, kanban, and other hybrid policies have been in use for years as better alternatives to MRP-based push control policies. It is a fact that these policies, although efficient, are far from optimal. Our goal is to develop a methodology that, for a given system, finds a dynamic control policy via intelligent agents. Such a policy while achieving the productivity (i.e., demand service rate) goal of the system will optimize a cost/reward function based on the WIP inventory. To achieve this goal we applied a simulation-based optimization technique called Reinforcement Learning (RL) on a four-station serial line. The control policy attained by the application of a RL algorithm was compared with the other existing policies on the basis of total average WIP and average cost of WIP. We also develop a heuristic control policy in light of our experience gained from a close examination of the policies obtained by the RL algorithm. This heuristic policy named Behavior-Based Control (BBC), although placed second to the RL policy, proved to be a more efficient and leaner control policy than most of the existing policies in the literature. The performance of the BBC policy was found to be comparable to the Extended Kanban Control System (EKCS), which as per our experimentation, turned out to be the best of the existing policies. The numerical results used for comparison purposes were obtained from a four-station serial line with two different (constant and Poisson) demand arrival processes.