The Effect of Speech Fragmentation and Audio Encodings on Automatic Parkinson’s Disease Recognition

Abstract
Parkinson’s disease is a neurological disease which is incurable according to current clinical knowledge. Therefore, early detection and provision of appropriate treatment are of primary importance. Speech is one of the biomarkers that enable the detection of Parkinson’s disease affection. Numerous researches are based on recordings from controlled environments; nonetheless fewer apply real circumstances. In the present study, three objectives were examined: recording fragmentation (paragraph, sentences, time-based), variable encodings (Pulse-Code Modulation [PCM], GSM-Full Rate [FR], G.723.1) and majority voting on 8 kHz records using multiple classifiers. Support Vector Machine (SVM), Long Short-Term Memory (LSTM), i-vector and x-vector classifiers were evaluated in contrast with SVM as baseline. The highest results in accuracy and F1-score were achieved using i-vector models. Although variable encodings generally caused decrease in Parkinson-disease recognition, decline was within 2% - 3% at best. Moreover, fragmentation did not yield a clear outcome though some classifiers performed with the very similar efficiency along the differently fragmented sets. Majority voting did produce a slight increase in classification performance compared to as if no aggregation is used.

This publication has 24 references indexed in Scilit: