Abstract

The goal of this research was to identify locations in the genome of the Entylia carinata, known as the treehopper, that are associated with anomalous behavior exhibited by the species. Treehoppers are phytophagous insects and are shown to feed, reproduce, and rear their young on specific aster species. Observation has shown that the insects will disregard potential mates in close proximity in favor of those that originate from the same plant species as themselves. This behavior suggests genetic separation in the species based on plant nativity and warrants genetic analysis. Machine learning offers an effective genetic association technique due to its ability to identify obscure patterns and inferences in data and was chosen as an alternative approach to the treehopper problem. A random forest classification model was trained on sampled treehopper genetic data to predict the type of plant that the samples originated from. Once tuned, the model scored a 0.757 during leave-one-out cross validation and averaged a 0.83 AUROC between the three plant species. The feature importances were extracted and evaluated, yielding a set of locations in the genome that the model relied on during prediction. Because these locations informed the model on the native plant species of unseen samples, they are deemed indicative of the observed behavior and can be used for future analysis in tandem with the results of more traditional parametric genetic analysis tools.

College

College of Science & Engineering

Department

Biology

Campus

Winona

First Advisor/Mentor

Colin Engstrom

Second Advisor/Mentor

Amy Runck

Location

Ballroom - Kryzsko Commons

Start Date

4-18-2024 9:00 AM

End Date

4-18-2024 10:00 AM

Presentation Type

Poster Session

Format of Presentation or Performance

In-Person

Session

1a=9am-10am

Poster Number

33

Share

COinS
 
Apr 18th, 9:00 AM Apr 18th, 10:00 AM

Genetic Association in Entylia carinata using Random Forest Classification

Ballroom - Kryzsko Commons

The goal of this research was to identify locations in the genome of the Entylia carinata, known as the treehopper, that are associated with anomalous behavior exhibited by the species. Treehoppers are phytophagous insects and are shown to feed, reproduce, and rear their young on specific aster species. Observation has shown that the insects will disregard potential mates in close proximity in favor of those that originate from the same plant species as themselves. This behavior suggests genetic separation in the species based on plant nativity and warrants genetic analysis. Machine learning offers an effective genetic association technique due to its ability to identify obscure patterns and inferences in data and was chosen as an alternative approach to the treehopper problem. A random forest classification model was trained on sampled treehopper genetic data to predict the type of plant that the samples originated from. Once tuned, the model scored a 0.757 during leave-one-out cross validation and averaged a 0.83 AUROC between the three plant species. The feature importances were extracted and evaluated, yielding a set of locations in the genome that the model relied on during prediction. Because these locations informed the model on the native plant species of unseen samples, they are deemed indicative of the observed behavior and can be used for future analysis in tandem with the results of more traditional parametric genetic analysis tools.

 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.