An Efficient Row Key Encoding Method with ASCII Code for Storing Geospatial Big Data in HBase
Open Access
- 25 October 2020
- journal article
- research article
- Published by MDPI AG in ISPRS International Journal of Geo-Information
- Vol. 9 (11), 625
- https://doi.org/10.3390/ijgi9110625
Abstract
Recently, increasing amounts of multi-source geospatial data (raster data of satellites and textual data of meteorological stations) have been generated, which can play a cooperative and important role in many research works. Efficiently storing, organizing and managing these data is essential for their subsequent application. HBase, as a distributed storage database, is increasingly popular for the storage of unstructured data. The design of the row key of HBase is crucial to improving its efficiency, but large numbers of researchers in the geospatial area do not conduct much research on this topic. According the HBase Official Reference Guide, row keys should be kept as short as is reasonable while remaining useful for the required data access. In this paper, we propose a new row key encoding method instead of conventional stereotypes. We adopted an existing hierarchical spatio-temporal grid framework as the row key of the HBase to manage these geospatial data, with the difference that we utilized the obscure but short American Standard Code for Information Interchange (ASCII) to achieve the structure of the grid rather than the original grid code, which can be easily understood by humans but is very long. In order to demonstrate the advantage of the proposed method, we stored the daily meteorological data of 831 meteorological stations in China from 1985 to 2019 in HBase; the experimental result showed that the proposed method can not only maintain an equivalent query speed but can shorten the row key and save storage resources by 20.69% compared with the original grid codes. Meanwhile, we also utilized GF-1 imagery to test whether these improved row keys could support the storage and querying of raster data. We downloaded and stored a part of the GF-1 imagery in Henan province, China from 2017 to 2018; the total data volume reached about 500 GB. Then, we succeeded in calculating the daily normalized difference vegetation index (NDVI) value in Henan province from 2017 to 2018 within 54 min. Therefore, the experiment demonstrated that the improved row keys can also be applied to store raster data when using HBase.Keywords
This publication has 21 references indexed in Scilit:
- The Australian Geoscience Data Cube — Foundations and lessons learnedRemote Sensing of Environment, 2017
- Assimilating a synthetic Kalman filter leaf area index series into the WOFOST model to improve regional winter wheat yield estimationAgricultural and Forest Meteorology, 2016
- Big Data challenges in building the Global Earth Observation System of SystemsEnvironmental Modelling & Software, 2015
- Improving winter wheat yield estimation by assimilation of the leaf area index from Landsat TM and MODIS data into the WOFOST modelAgricultural and Forest Meteorology, 2015
- HB+treePublished by Association for Computing Machinery (ACM) ,2015
- A pole-oriented discrete global grid system: Quaternary quadrangle meshComputers & Geosciences, 2013
- HGrid: A Data Model for Large Geospatial Data Sets in HBasePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2013
- Generic cumulative annular bucket histogram for spatial selectivity estimation of spatial database management systemInternational Journal of Geographical Information Science, 2013
- Next-generation Digital EarthProceedings of the National Academy of Sciences of the United States of America, 2012
- Examination of a constant-area quadrilateral grid in representation of global digital elevation modelsInternational Journal of Geographical Information Science, 2004