The Modern Research Data Portal: a design pattern for networked, data-intensive science
Open Access
- 15 January 2018
- journal article
- research article
- Published by PeerJ in PeerJ Computer Science
- Vol. 4, e144
- https://doi.org/10.7717/peerj-cs.144
Abstract
We describe best practices for providing convenient, high-speed, secure access to large data via research data portals. We capture these best practices in a new design pattern, the Modern Research Data Portal, that disaggregates the traditional monolithic web-based data portal to achieve orders-of-magnitude increases in data transfer performance, support new deployment architectures that decouple control logic from data storage, and reduce development and operations costs. We introduce the design pattern; explain how it leverages high-performance data enclaves and cloud-based data management services; review representative examples at research laboratories and universities, including both experimental facilities and supercomputer sites; describe how to leverage Python APIs for authentication, authorization, data transfer, and data sharing; and use coding examples to demonstrate how these APIs can be used to implement a range of research data portal capabilities. Sample code at a companion web site, https://docs.globus.org/mrdp, provides application skeletons that readers can adapt to realize their own research data portals.Keywords
Other Versions
Funding Information
- United States National Science Foundation (ACI-1148484)
- Department of Energy’s Office of Advanced Scientific Computing Research (DE-AC02-06CH11357)
This publication has 25 references indexed in Scilit:
- Cloud Kotta: Enabling secure and scalable data analytics in the cloudPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2016
- Globus Nexus: A Platform-as-a-Service provider of research identity, profile, and group managementFuture Generation Computer Systems, 2016
- Globus Data Publication as a Service: Lowering Barriers to Reproducible SciencePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2015
- Efficient and Secure Transfer, Synchronization, and Sharing of Big DataIEEE Cloud Computing, 2014
- The Science DMZPublished by Association for Computing Machinery (ACM) ,2013
- The conundrum of sharing research dataJournal of the American Society for Information Science and Technology, 2012
- The iPlant Collaborative: Cyberinfrastructure for Plant BiologyFrontiers in Plant Science, 2011
- The Dataverse Network®: An Open-Source Application for Sharing, Discovering and Preserving DataD-Lib Magazine, 2011
- Cloud computing and SaaS as new computing platformsCommunications of the ACM, 2010
- PhEDEx Data ServiceJournal of Physics: Conference Series, 2010