Presentation Title

A Comparison of Bioinformatics Packages BioPython and BioConductor

Presenter Information

Michael DeschFollow

Loading...

Media is loading
 

Abstract

This research focuses on the comparison of BioPython and BioConductor, two open-source software packages for bioinformatics analysis in Python and R programming languages, respectively. A review of both packages is presented that includes the execution style, dependencies, and a subset of optimal tasks for each package. The primary contribution of this research is to determine whether any significant differences exist in each of Memory and Run-time performance between these packages when executing analysis pipelines that are as equivalent as possible within each package. A Variant Call Format (VCF) data set is used as input into each analysis pipeline and consists of Single Nucleotide Polymorphism (SNP) data for six populations of Keeled Treehopper (Entylia carinata), totaling 100 organisms and across 18,318 loci. The resulting measurements of Memory and Run-time performance are visualized and examined. A conclusion provides available evidence-based recommendations with regard to using either or both packages for specific use-cases.

College

College of Science & Engineering

Department

Computer Science

Location

Winona, Minnesota

Presentation Type

Video (Prerecorded-MP4)

Comments

Pre-recorded video will be submitted by end-of-day Monday 4/12/21.

Share

COinS
 

A Comparison of Bioinformatics Packages BioPython and BioConductor

Winona, Minnesota

This research focuses on the comparison of BioPython and BioConductor, two open-source software packages for bioinformatics analysis in Python and R programming languages, respectively. A review of both packages is presented that includes the execution style, dependencies, and a subset of optimal tasks for each package. The primary contribution of this research is to determine whether any significant differences exist in each of Memory and Run-time performance between these packages when executing analysis pipelines that are as equivalent as possible within each package. A Variant Call Format (VCF) data set is used as input into each analysis pipeline and consists of Single Nucleotide Polymorphism (SNP) data for six populations of Keeled Treehopper (Entylia carinata), totaling 100 organisms and across 18,318 loci. The resulting measurements of Memory and Run-time performance are visualized and examined. A conclusion provides available evidence-based recommendations with regard to using either or both packages for specific use-cases.