Dynamic Region of Interest Selection in Remote Photoplethysmography: Proof-of-Concept Study

Abstract
Journal of Medical Internet Research - International Scientific Journal for Medical Research, Information and Communication on the Internet #Preprint #PeerReviewMe: Warning: This is a unreviewed preprint. Readers are warned that the document has not been peer-reviewed by expert/patient reviewers or an academic editor, may contain misleading claims, and is likely to undergo changes before final publication, if accepted, or may have been rejected/withdrawn. Readers with interest and expertise are encouraged to sign up as peer-reviewer, if the paper is within an open peer-review period. Please cite this preprint only for review purposes or for grant applications and CVs (if you are the author). Background: Remote photoplethysmography (rPPG) can record vital signs (VS) by detecting subtle changes in the light reflected from the skin. Lifelight®(Xim Ltd) is a novel software being developed as a medical device for the contactless measurement of VS using rPPG via the integral cameras on smart devices. Research to date has focused on extracting the pulsatile VS signal from the raw signal, which can be influenced by factors such as ambient light, skin thickness, facial movements and skin tone. Objective: This preliminary proof-of-concept study outlines a dynamic approach to rPPG signal processing in which green channel signals from the most relevant areas of the face (the mid-face, comprising the cheeks, nose and top of the lip) are optimized for each subject using tiling and aggregation (T&A) algorithms. Methods: High-resolution 60 second videos were recorded during the VISION-MD study (Clinicaltrials.gov identifier NCT04763746). The mid-face was divided into 62 tiles of 20 × 20 pixels and the best 30 tiles, based on the signal to noise ratio in the frequency domain (SNR-F), aggregated using five different algorithms. Signals from the mid-face before and after T&A were categorized by a trained observer blinded to the data processing as 0 (high quality, suitable for algorithm training), 1 (suitable for algorithm testing) or 2 (inadequate quality). In a secondary analysis, observer categories were compared for signals predicted to improve category following T&A based on SNR-F score. Observer ratings and SNR-F scores were also compared before and after T&A for Fitzpatrick skin tones 5 and 6, in which rPPG is hampered by light absorption by melanin. Results: The analysis used 4310 videos recorded from 1315 participants. Signals in categories 2 and 1 had lower mean SNR-F scores than those in category 0. T&A improved the mean SNR-F score using all algorithms. Nine to 21% improved by at least one category, with up to 10% improving into category 0, and 15–39% remained in the same category,. Importantly, 9–21% improved from category 2 (not usable) into category 1. Improvements were seen with all the algorithms tested. No more than 2% of signals were assigned into a lower-quality category following T&A. In the secondary analysis, 62% of 52 signals were re-categorized by the observer as predicted from SNR-F score. T&A improved SNR-F scores in darker skin tones; 41% of 369 signals improved from category 2 to 1 and 12% from category 1 to 0. Conclusions: The T&A approach to dynamic ROI selection improved signal quality, including in dark skin tones. The method was verified by comparison with a trained observer rating. T&A can reasonably be expected to overcome factors that compromise whole-face rPPG. The performance of this method in estimating VS is currently being assessed.

This publication has 31 references indexed in Scilit: