Scalability and interoperability within glideinWMS
Open Access
- 1 April 2010
- journal article
- Published by IOP Publishing in Journal of Physics: Conference Series
- Vol. 219 (6), 062036
- https://doi.org/10.1088/1742-6596/219/6/062036
Abstract
Physicists have access to thousands of CPUs in grid federations such as OSG and EGEE. With the start-up of the LHC, it is essential for individuals or groups of users to wrap together available resources from multiple sites across multiple grids under a higher user-controlled layer in order to provide a homogeneous pool of available resources. One such system is glideinWMS, which is based on the Condor batch system. A general discussion of glideinWMS can be found elsewhere. Here, we focus on recent advances in extending its reach: scalability and integration of heterogeneous compute elements. We demonstrate that the new developments exceed the design goal of over 10,000 simultaneous running jobs under a single Condor schedd, using strong security protocols across global networks, and sustaining a steady-state job completion rate of a few Hz. We also show interoperability across heterogeneous computing elements achieved using client-side methods. We discuss this technique and the challenges in direct access to NorduGrid and CREAM compute elements, in addition to Globus based systems.Keywords
This publication has 2 references indexed in Scilit:
- The gLite Workload Management SystemLecture Notes in Computer Science, 2009
- glideinWMS—a generic pilot-based workload management systemJournal of Physics: Conference Series, 2008