Scalability and interoperability within glideinWMS

Open Access

1 April 2010

journal article
Published by IOP Publishing in Journal of Physics: Conference Series

Vol. 219 (6), 062036
https://doi.org/10.1088/1742-6596/219/6/062036

Abstract

Physicists have access to thousands of CPUs in grid federations such as OSG and EGEE. With the start-up of the LHC, it is essential for individuals or groups of users to wrap together available resources from multiple sites across multiple grids under a higher user-controlled layer in order to provide a homogeneous pool of available resources. One such system is glideinWMS, which is based on the Condor batch system. A general discussion of glideinWMS can be found elsewhere. Here, we focus on recent advances in extending its reach: scalability and integration of heterogeneous compute elements. We demonstrate that the new developments exceed the design goal of over 10,000 simultaneous running jobs under a single Condor schedd, using strong security protocols across global networks, and sustaining a steady-state job completion rate of a few Hz. We also show interoperability across heterogeneous computing elements achieved using client-side methods. We discuss this technique and the challenges in direct access to NorduGrid and CREAM compute elements, in addition to Globus based systems.

Keywords

This publication has 2 references indexed in Scilit:

The gLite Workload Management System
Lecture Notes in Computer Science, 2009
glideinWMS—a generic pilot-based workload management system
Journal of Physics: Conference Series, 2008

Cited by 4 articles