An Incorrect Data Detection Method for Big Data Cleaning of Machinery Condition Monitoring

Abstract
The presence of incorrect data leads to the decrease of condition-monitoring big data quality. As a result, unreliable or misleading results are probably obtained by analyzing these poor-quality data. In this paper, to improve the data quality, an incorrect data detection method based on an improved local outlier factor (LOF) is proposed for data cleaning. First, a sliding window technique is used to divide data into different segments. These segments are considered as different objects and their attributes consist of time-domain statistical features extracted from each segment, such as mean, maximum and peak-to-peak value. Second, a kernel-based LOF (KLOF) is calculated using these attributes to evaluate the degree of each segment being incorrect data. Third, according to these KLOF values and a threshold value, incorrect data are detected. Finally, a simulation of vibration data generated by a defective rolling element bearing and three real cases concerning a fixed-axle gearbox, a wind turbine, and a planetary gearbox are used to verify the effectiveness of the proposed method, respectively. The results demonstrate that the proposed method is able to detect both missing segments and abnormal segments, which are two typical incorrect data, effectively, and thus is helpful for big data cleaning of machinery condition monitoring.
Funding Information
  • National Natural Science Foundation of China (61673311)
  • NSFC-Zhejiang Joint Fund for the Integration of Industrialization and Informatization (U1709208)
  • National Program for Support of Top-notch Young Professionals

This publication has 33 references indexed in Scilit: