Effects of data cleansing on load prediction algorithms

Abstract
The rollout of advanced metering infrastructure that is planned in many countries worldwide will lead to a massive inflow of data from moderately reliable sensory equipment. In principle, this will make intelligent and automated planning and operation possible at an increasingly finer scale in the electric grid. However, errors can creep into the meter data, either from faulty sensors or during transmission from the meters to the database. This work studies the role of data cleansing as a preprocessing step for short-term (24-hour) power load prediction. We focus on cleansing and prediction at several levels of granularity, from the transmission level via distribution substations down to single households. We believe that preprocessing filters such as cleansing should lead to more robustness and/or precision in the subsequent processing step. However, load cleansing frameworks tend to make the popular assumption of normally and independently distributed noise in the time series. We show that this is incorrect at the diurnal level, due to the characteristic pattern of power consumption, with two peak loads during daytime and a nighttime trough. Moreover, we present empirical evidence that a preprocessing step based on this assumption fails to contribute positively to the performance of the subsequent prediction step. To rectify this problem, we suggest to subtract the average power load consumption in a given period before cleansing. We present empirical evidence that this improves the robustness and efficiency of load cleansing as a preprocessing step. Data cleansing and load prediction is performed by a system that searches out parameters using an evolutionary approach.