A Distribution-Free Model for Longitudinal Metagenomic Count Data

Abstract
Longitudinal metagenomics has been widely studied in the recent decade to provide valuable insight for understanding microbial dynamics. The correlation within each subject can be observed across repeated measurements. However, previous methods that assume independent correlation may suffer from incorrect inferences. In addition, methods that do account for intra-sample correlation may not be applicable for count data. We proposed a distribution-free approach, namely CorrZIDF, which extends the current method to model correlated zero-inflated metagenomic count data, offering a powerful and accurate solution for detecting significance features. This method can handle different working correlation structures without specifying each margin distribution of the count data. Through simulation studies, we have shown the robustness of CorrZIDF when selecting a working correlation structure for repeated measures studies to enhance the efficiency of estimation. We also compared four methods using two real datasets, and the new proposed method identified more unique features that were reported previously on the relevant research.
Funding Information
  • National Institute of General Medical Sciences (1R01GM139829- 430 01)
  • National Institute of Health (1P01AI148104-01A1)
  • National Institute on Aging (U19AG065169)
  • United States Department of Agriculture (ARZT- 431 1361620-H22-149)