A Machine Learning Approach for Detecting Digital Behavioral Patterns of Depression Using Nonintrusive Smartphone Data (Complementary Path to Patient Health Questionnaire-9 Assessment): Prospective Observational Study

Abstract
Journal of Medical Internet Research - International Scientific Journal for Medical Research, Information and Communication on the Internet #Preprint #PeerReviewMe: Warning: This is a unreviewed preprint. Readers are warned that the document has not been peer-reviewed by expert/patient reviewers or an academic editor, may contain misleading claims, and is likely to undergo changes before final publication, if accepted, or may have been rejected/withdrawn. Readers with interest and expertise are encouraged to sign up as peer-reviewer, if the paper is within an open peer-review period. Please cite this preprint only for review purposes or for grant applications and CVs (if you are the author). Background: Depression is a major global cause of morbidity, an economic burden and the greatest health challenge leading to chronic disability. Mobile monitoring of mental conditions has long been a sought-after metric to overcome the problems associated with the screening, diagnosis and monitoring of depression and its heterogeneous presentation. The widespread availability of smartphones has made it possible to use its data to generate digital behavioural models which can be used for both clinical and remote screening and monitoring purposes, providing a tentative and scalable solution to the pressing global need for early and effective solutions. This study is novel because it adds to the field by conducting a trial using private and non-intrusive sensors that can help detect and monitor depression in a continuous passive manner. Objective: This study demonstrates a novel mental behavioral profiling metric (Mental Health Similarity Score) derived from analyzing passively monitored, private and non-intrusive smartphone usage data, to identify and track depressive behavior and its progression. The analysis is performed using machine learning models trained on different levels of depression severity measured through the PHQ-9 (Patient Health Questionnaire-9) questionnaire. Methods: Smartphone data sets and self-reported 9-item PHQ depression assessments were collected from 558 smartphone users on the Android operating system in an observational study over an average of 10.7 days (SD=23.7). We quantified 37 digital behavioral markers from the passive smartphone data set and explored the relationship between the digital behavioral markers and depression using correlation coefficients. We leveraged four separate supervised random forest machine learning (ML) classification algorithms with hyperparameter optimization, fifteen-fold cross-validation, bootstrapping and imbalanced data handling to predict depression and its severity using PHQ-9 scores as the ground truth. We also quantified an additional three digital markers from gyroscope sensors and explored its feasibility in improving the model’s accuracy in detecting depression. Results: Of the 558 participants, 254 (46%) were males and 286 (51%) were females and 18 (3%) preferred not to say. Participants age distribution is as follows: 474 (85%) users between the ages of 18-25, 29 (5%) aged between 26-35 , 42 (7%) aged between 36-55, 10 (2%) were aged between 56-64 and 3 (<1%) above 64 years of age. Of the 558 reported PHQ-9 assessments, 63 responses were none (not depressed; scored =10). The PHQ-9 Binary Non-sensor (none vs. severe) model achieved the following metrics: precision 85-89%; recall 85-89%; F1 87%, and overall accuracy is 87%. The PHQ-9 three class (none vs. mild vs. severe) model achieved the following metrics: precision 74-86%; recall 76-83%; F1 75-84%, and overall accuracy is 78%. When correlating all 9 items of the PHQ-9, a significant positive Pearson correlation was found specifically between PHQ-9 questions 2, 6 and 9 within the severe category users and the mental behavioral profiling metric (r=0.73). The PHQ-9 question specific (questions 2,6, and 9) model achieved the following metrics: precision 76-80%; recall 75-81%; F1 78-89%, and overall accuracy is 78%. When adding a gyroscope sensor as a feature, the Pearson correlation between 2,6 and 9 dropped from r= 0.73 to r=0.46. Mean activity (P<.001) and average gap activity (P<.001) features from the gyroscope sensors had statistically significant differences between none and severe individuals. The PHQ-9 Gyroscope sensor model achieved the following metrics: precision 74-78%; recall 67-83%; F1 72-78%, and overall accuracy is 76%. Conclusions: Our results demonstrate that the Mental Health Similarity Score can be used to identify and track depressive behavior and its progression with high accuracy. Therefore, the current and traditional methods of assessing depression can be coupled with digital behavioral markers to have a significant impact in mitigating depression and its far-reaching consequences.