Performance Evaluations of Distributed File Systems for Scientific Big Data in FUSE Environment
Open Access
- 18 June 2021
- journal article
- research article
- Published by MDPI AG in Electronics
- Vol. 10 (12), 1471
- https://doi.org/10.3390/electronics10121471
Abstract
Data are important and ever growing in data-intensive scientific environments. Such research data growth requires data storage systems that play pivotal roles in data management and analysis for scientific discoveries. Redundant Array of Independent Disks (RAID), a well-known storage technology combining multiple disks into a single large logical volume, has been widely used for the purpose of data redundancy and performance improvement. However, this requires RAID-capable hardware or software to build up a RAID-enabled disk array. In addition, it is difficult to scale up the RAID-based storage. In order to mitigate such a problem, many distributed file systems have been developed and are being actively used in various environments, especially in data-intensive computing facilities, where a tremendous amount of data have to be handled. In this study, we investigated and benchmarked various distributed file systems, such as Ceph, GlusterFS, Lustre and EOS for data-intensive environments. In our experiment, we configured the distributed file systems under a Reliable Array of Independent Nodes (RAIN) structure and a Filesystem in Userspace (FUSE) environment. Our results identify the characteristics of each file system that affect the read and write performance depending on the features of data, which have to be considered in data-intensive computing environments.Keywords
Funding Information
- National Research Foundation of Korea (NRF-2008-00458)
This publication has 13 references indexed in Scilit:
- IEEE Standard for Information Technology--Portable Operating System Interface (POSIX(TM)) Base Specifications, Issue 7Published by Institute of Electrical and Electronics Engineers (IEEE) ,2018
- Benchmarking Large-Scale Object Storage ServersPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2016
- Ceph Distributed File System Benchmarks on an Openstack CloudPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2015
- A Diffusion Model with Constant Source and Sinks for Social Graph PartitioningPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2015
- Evaluating the performance and scalability of the Ceph distributed storage systemPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2014
- Performance Overhead among Three Hypervisors: An Experimental Study Using Hadoop BenchmarksPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2013
- LZpack: A Cluster File System BenchmarkPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2010
- Benchmarking cloud serving systems with YCSBPublished by Association for Computing Machinery (ACM) ,2010
- Grid resource management---CRUSHPublished by Association for Computing Machinery (ACM) ,2006
- Computing in the RAIN: a reliable array of independent nodesIEEE Transactions on Parallel and Distributed Systems, 2001