Abstract
The goal of this research was to identify locations in the genome of the Entylia carinata, known as the treehopper, that are associated with anomalous behavior exhibited by the species. Treehoppers are phytophagous insects and are shown to feed, reproduce, and rear their young on specific aster species. Observation has shown that the insects will disregard potential mates in close proximity in favor of those that originate from the same plant species as themselves. This behavior suggests genetic separation in the species based on plant nativity and warrants genetic analysis. Machine learning offers an effective genetic association technique due to its ability to identify obscure patterns and inferences in data and was chosen as an alternative approach to the treehopper problem. A random forest classification model was trained on sampled treehopper genetic data to predict the type of plant that the samples originated from. Once tuned, the model scored a 0.757 during leave-one-out cross validation and averaged a 0.83 AUROC between the three plant species. The feature importances were extracted and evaluated, yielding a set of locations in the genome that the model relied on during prediction. Because these locations informed the model on the native plant species of unseen samples, they are deemed indicative of the observed behavior and can be used for future analysis in tandem with the results of more traditional parametric genetic analysis tools.
College
College of Science & Engineering
Department
Biology
Campus
Winona
First Advisor/Mentor
Colin Engstrom
Second Advisor/Mentor
Amy Runck
Location
Ballroom - Kryzsko Commons
Start Date
4-18-2024 9:00 AM
End Date
4-18-2024 10:00 AM
Presentation Type
Poster Session
Format of Presentation or Performance
In-Person
Session
1a=9am-10am
Poster Number
33
Included in
Genetic Association in Entylia carinata using Random Forest Classification
Ballroom - Kryzsko Commons
The goal of this research was to identify locations in the genome of the Entylia carinata, known as the treehopper, that are associated with anomalous behavior exhibited by the species. Treehoppers are phytophagous insects and are shown to feed, reproduce, and rear their young on specific aster species. Observation has shown that the insects will disregard potential mates in close proximity in favor of those that originate from the same plant species as themselves. This behavior suggests genetic separation in the species based on plant nativity and warrants genetic analysis. Machine learning offers an effective genetic association technique due to its ability to identify obscure patterns and inferences in data and was chosen as an alternative approach to the treehopper problem. A random forest classification model was trained on sampled treehopper genetic data to predict the type of plant that the samples originated from. Once tuned, the model scored a 0.757 during leave-one-out cross validation and averaged a 0.83 AUROC between the three plant species. The feature importances were extracted and evaluated, yielding a set of locations in the genome that the model relied on during prediction. Because these locations informed the model on the native plant species of unseen samples, they are deemed indicative of the observed behavior and can be used for future analysis in tandem with the results of more traditional parametric genetic analysis tools.