PyCOMPSs: Parallel computational workflows in Python
- 19 August 2015
- journal article
- research article
- Published by SAGE Publications in The International Journal of High Performance Computing Applications
- Vol. 31 (1), 66-82
- https://doi.org/10.1177/1094342015594678
Abstract
The use of the Python programming language for scientific computing has been gaining momentum in the last years. The fact that it is compact and readable and its complete set of scientific libraries are two important characteristics that favour its adoption. Nevertheless, Python still lacks a solution for easily parallelizing generic scripts on distributed infrastructures, since the current alternatives mostly require the use of APIs for message passing or are restricted to embarrassingly parallel computations. In that sense, this paper presents PyCOMPSs, a framework that facilitates the development of parallel computational workflows in Python. In this approach, the user programs her script in a sequential fashion and decorates the functions to be run as asynchronous parallel tasks. A runtime system is in charge of exploiting the inherent concurrency of the script, detecting the data dependencies between tasks and spawning them to the available resources. Furthermore, we show how this programming model can be built on top of a Big Data storage architecture, where the data stored in the backend is abstracted and accessed from the application in the form of persistent objects.Keywords
This publication has 11 references indexed in Scilit:
- COSMOS: Python library for massively parallel workflowsBioinformatics, 2014
- ServiceSs: An Interoperable Programming Framework for the CloudJournal of Grid Computing, 2013
- Parallel astronomical data processing with Python: Recipes for multicore machinesAstronomy and Computing, 2013
- Ruffus: a lightweight Python library for computational pipelinesBioinformatics, 2010
- Practically Trivial Parallel Data Processing in a Neuroscience LaboratoryPublished by Springer Science and Business Media LLC ,2010
- COMP Superscalar: Bringing GRID Superscalar and GCM TogetherPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2008
- MPI for Python: Performance improvements and MPI-2 extensionsJournal of Parallel and Distributed Computing, 2007
- Swift: Fast, Reliable, Loosely Coupled Parallel ComputationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2007
- Programming Grid Applications with GRID SuperscalarJournal of Grid Computing, 2003
- Dip: A parallel program development environmentPublished by Springer Science and Business Media LLC ,1996