Acceleration of a Feature Selection Algorithm Using High Performance Computing

Abstract
Feature selection is a subfield of data analysis that is on reducing the dimensionality of datasets, so that subsequent analyses over them can be performed in affordable execution times while keeping the same results. Joint Mutual Information (JMI) is a highly used feature selection method that removes irrelevant and redundant characteristics. Nevertheless, it has high computational complexity. In this work, we present a multithreaded MPI parallel implementation of JMI to accelerate its execution on distributed memory systems, reaching speedups of up to 198.60 when running on 256 cores, and allowing for the analysis of very large datasets that do not fit in the main memory of a single node.