Model for Anticipating Failures by Omission in Calculation Grids

Abstract
Computer grids are infrastructures in which heterogeneous and distributed resources offer very high computing or storage performance. If they offer extreme computing performance, they are also subject to the appearance of many failures related to this type of architecture. While performing tasks, if the response time of a node in the system incomprehensibly exceeds the requirements of the specifications, the node experiences an omission failure. The task running in the failed node will be unavailable until the node resumes normal activity. Waiting not being a possible solution, many fault tolerance methods have been proposed. Despite this large number of fault tolerance methods on offer, computer grids are still prone to many failures by omission. In this work, a numerical study of the failures by omission which occur in the calculation grids during the execution of the tasks was carried out and a model allowing anticipating its failures was proposed with the formalism PDEVS (Parallel Discret EVent system Specification).