Speech Based Depression Detection using Convolution Neural Networks

Abstract
Depression has become a serious mental disorder nowadays affecting people of almost all age groups. Loss of interest in daily activities, constant feeling of being isolated, hopelessness etc. causing significant impairment in life. This illness will affect the physical and mental health of the individual affecting his/her emotional stability. Emotions are a way of expression of one’s state of mind in the form of thoughts, feelings or behavioural responses. For a depressed individual, the emotions are often negative in nature. Diagnosis of depression is a complex task as the disease may be unidentified by the patient itself. Sometimes the patient may be reluctant to consult a doctor. The long term ignorance of the illness may worsen the mental health of the one suffering from it. Thus the early diagnosis of depression is of great significance. With the emergence of neural networks and pattern recognition, many researchers have put effort in detecting depression by analysing non-verbal cues, such as facial expressions, gesture, body language and tone of voice. Recent studies have shown that the speech emotion analysis can effectively be used in distinguishing emotional features and a depressed speech varies from that of a normal speech to a great extent. The depressed patient normally speaks in a low voice, slowly, sometimes stuttering, whispering, trying several times before they speak up or become mute in the middle of a sentence. This paper proposes a CNN architecture for learning the audio features in the speech for detecting depression, identifying the emotions and to infer the emotional severity of the individual. This paper also reviews some of the existing research methods in the field of depression analysis.