Applying the ETL Process to Blockchain Data. Prospect and Findings

Open Access

10 April 2020

journal article
research article
Published by MDPI AG in Information

Vol. 11 (4), 204
https://doi.org/10.3390/info11040204

Abstract

We present a novel strategy, based on the Extract, Transform and Load (ETL) process, to collect data from a blockchain, elaborate and make it available for further analysis. The study aims to satisfy the need for increasingly efficient data extraction strategies and effective representation methods for blockchain data. For this reason, we conceived a system to make scalable the process of blockchain data extraction and clustering, and to provide a SQL database which preserves the distinction between transaction and addresses. The proposed system satisfies the need to cluster addresses in entities, and the need to store the extracted data in a conventional database, making possible the data analysis by querying the database. In general, ETL processes allow the automation of the operation of data selection, data collection and data conditioning from a data warehouse, and produce output data in the best format for subsequent processing or for business. We focus on the Bitcoin blockchain transactions, which we organized in a relational database to distinguish between the input section and the output section of each transaction. We describe the implementation of address clustering algorithms specific for the Bitcoin blockchain and the process to collect and transform data and to load them in the database. To balance the input data rate with the elaboration time, we manage blockchain data according to the lambda architecture. To evaluate our process, we first analyzed the performances in terms of scalability, and then we checked its usability by analyzing loaded data. Finally, we present the results of a toy analysis, which provides some findings about blockchain data, focusing on a comparison between the statistics of the last year of transactions, and previous results of historical blockchain data found in the literature. The ETL process we realized to analyze blockchain data is proven to be able to perform a reliable and scalable data acquisition process, whose result makes stored data available for further analysis and business.

Keywords

This publication has 10 references indexed in Scilit:

Identifying the vulnerabilities of bitcoin anonymous mechanism based on address clustering
Science China Information Sciences, 2020
Towards open data blockchain analytics: a Bitcoin perspective
Royal Society Open Science, 2018
A Petri Nets Model for Blockchain Analysis
The Computer Journal, 2018
Automatic Bitcoin Address Clustering
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2017
Data-driven analysis of Bitcoin properties: exploiting the users graph
International Journal of Data Science and Analytics, 2017
Potential Anticancer Mechanisms of a Novel EGFR/DNA-Targeting Combi-Molecule (JDF12) against DU145 Prostate Cancer Cells: An iTRAQ-Based Proteomic Analysis
BioMed Research International, 2017
Visualizing Dynamic Bitcoin Transaction Patterns
Big Data, 2016
Structure and Anonymity of the Bitcoin Transaction Graph
Future Internet, 2013
An event-based model for contracts
Electronic Proceedings in Theoretical Computer Science, 2013
An Analysis of Anonymity in the Bitcoin System
Published by Springer Science and Business Media LLC ,2012

Cited by 14 articles