Identifying biotic stress-associated molecular markers in wheat using differential gene expression and machine learning techniques

Abstract
Wheat is an important crop for global food security and a key crop for many developing countries. Thanks to next-generation sequencing (NGS) technologies, researchers can analyze the transcriptome of wheat and reveal differentially expressed genes (DEGs) responsible for essential agronomic traits and biotic stress tolerance. In addition, machine learning (ML) methods have opened new avenues to detect patterns in expression data and make predictions or decisions based on these patterns. We used both techniques to identify potential molecular markers in wheat associated with biotic stress in six gene expression studies conducted to investigate powdery mildew, blast fungus, rust, fly larval infection, greenbug aphid, and Stagonospora nodorum infections. A total of 24,152 threshold genes were collected from different studies, with the highest threshold being 7580 genes and the lowest being 1073 genes. The study identified several genes that were differentially expressed in all comparisons and genes that were present in only one data set. The study also discussed the possible role of certain genes in plant resistance. The Ta-TLP, HBP-1, WRKY, PPO, and glucan endo-1,3-beta-glucosidase genes were selected by the interpretable model-agnostic explanation algorithm as the most important genes known to play a significant role in resistance to biotic stress. Our results support the application of ML analysis in plant genomics and can help increase agricultural efficiency and production, leading to higher yields and more sustainable farming practices.