Gene Expression Analysis of Solanum lycopersicum - Bacillus megaterium Interaction to Identify Informative Genes Using Machine Learning Classifiers
ChapterConference Paper
There has been a growing interest in identifying specific plant growth-promoting rhizobacteria that confer health, growth, and protective benefits to plant host. Understanding the mechanisms of this association as well as the differences that determine the different outcomes can be exploited to optimize beneficial interactions. To this end, we developed a classifier capable of predicting the presence of Bacillus megaterium inoculated in tomato root tissue and identify potential informative genes related to their interaction. Two machine learning models, Kernel Logistic Regression and Multilayer Perceptron were studied. From the 4 Multilayer Perceptron classifiers tested (MLP-a, MLP-b, MLP-c and MLP-d) with different parameters, MLP-a and MLP-c achieved near optimal performance considering all the relevant metrics. Then, these classifiers were used as attribute evaluators to identify two sets of informative genes (IGs). MLP-a showed 216 highest-rated attributes. Among these IGs, 173 were identified as Solanum lycopersicum genes, 37 were assigned to 5 Bacillus subtilis protein, 4 were assigned to 1 Escherichia coli protein and 2 were unidentified. On the other hand, MLP-c showed the same highest-rated attributes adding 27 new attributes. Based on the results of MLP-a and MLP-c, considering the identified tomato IGs, a functional enrichment analysis was developed showing nine and eight biological pathways, respectively. Furthermore, the same IGs were used to compose biological networks from Arabidopsis thaliana orthologous genes. The biological networks identified for the first set were co-expression, shared protein domains, predicted interaction and co-localization. The second set presented the same networks adding physical interaction.