Presentation Title
A Comparison of Bioinformatics Packages BioPython and BioConductor
Loading...
Abstract
This research focuses on the comparison of BioPython and BioConductor, two open-source software packages for bioinformatics analysis in Python and R programming languages, respectively. A review of both packages is presented that includes the execution style, dependencies, and a subset of optimal tasks for each package. The primary contribution of this research is to determine whether any significant differences exist in each of Memory and Run-time performance between these packages when executing analysis pipelines that are as equivalent as possible within each package. A Variant Call Format (VCF) data set is used as input into each analysis pipeline and consists of Single Nucleotide Polymorphism (SNP) data for six populations of Keeled Treehopper (Entylia carinata), totaling 100 organisms and across 18,318 loci. The resulting measurements of Memory and Run-time performance are visualized and examined. A conclusion provides available evidence-based recommendations with regard to using either or both packages for specific use-cases.
College
College of Science & Engineering
Department
Computer Science
Location
Winona, Minnesota
Presentation Type
Video (Prerecorded-MP4)
A Comparison of Bioinformatics Packages BioPython and BioConductor
Winona, Minnesota
This research focuses on the comparison of BioPython and BioConductor, two open-source software packages for bioinformatics analysis in Python and R programming languages, respectively. A review of both packages is presented that includes the execution style, dependencies, and a subset of optimal tasks for each package. The primary contribution of this research is to determine whether any significant differences exist in each of Memory and Run-time performance between these packages when executing analysis pipelines that are as equivalent as possible within each package. A Variant Call Format (VCF) data set is used as input into each analysis pipeline and consists of Single Nucleotide Polymorphism (SNP) data for six populations of Keeled Treehopper (Entylia carinata), totaling 100 organisms and across 18,318 loci. The resulting measurements of Memory and Run-time performance are visualized and examined. A conclusion provides available evidence-based recommendations with regard to using either or both packages for specific use-cases.
Comments
Pre-recorded video will be submitted by end-of-day Monday 4/12/21.