Internet-Based Mental Health Survey Research: Navigating Internet Bots on Reddit

Abstract
This study was a multistage process of recruiting participants through Reddit with the intent of increasing data integrity when facing an infiltration of Internet bots. Approaches to increase data integrity centered around preventing the occurrence of Internet bots from the onset and increasing the ability to identify Internet bot responses. We attempted to detect bots in a study focused on understanding social factors related to autism and suicide risk. Four recruitment rounds occurred through Reddit on mental health–related subreddits, with one post made on each subreddit per recruitment round. We found high presence of bots in the initial rounds—indeed, using location data, one third of the total responses (33.4 percent; 118/353) came from just eight locations (i.e., 4.7 percent of all locations). The proportion of detected bots was significantly different across the rounds of recruitment (χ2 = 150.22, df = 3, p < 0.001). In round 4, language advertising compensation was removed from recruitment posts. This round had significantly lower proportions of detected bots compared with round 1 (χ2 = 33.01, df = 1, p < 0.001), round 2 (χ2 = 129.14, df = 1, p < 0.001), and round 3 (χ2 = 46.6, df = 1, p < 0.001). Through a multistage recruitment process, we were able to increase the integrity of our collected data, as determined by a low percentage of fraudulent responses. Only once we removed advertisement of compensation in recruitment posts, did we see a significant decrease in the quantity and percentage of Internet bot responses. This multistage recruitment study provides valuable information regarding how to adapt when an online survey study is infiltrated with Internet bots.