Galaxy, a web-based framework for the integration of genome analysis
Open Access
- Author:
- Blankenberg, Daniel James
- Graduate Program:
- Biochemistry, Microbiology, and Molecular Biology
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- October 02, 2009
- Committee Members:
- Anton Nekrutenko, Dissertation Advisor/Co-Advisor
Anton Nekrutenko, Committee Chair/Co-Chair
Webb Colby Miller, Committee Member
Ross Cameron Hardison, Committee Member
Stephan Schuster, Committee Member
Andrey S Krasilnikov, Committee Member - Keywords:
- Sequence Analysis
Genome
Databases
Algorithms
Molecular Sequence Data
Software - Abstract:
- The standardization and sharing of data and tools are among the biggest challenges facing large collaborative projects and small individual labs alike. Here a compact web application, Galaxy, is described which effectively addresses these issues. It provides an intuitive interface for the deposition and access of data and features a vast number of analysis tools including operations on genomic intervals, utilities for manipulation of multiple sequence alignments and molecular evolution algorithms. By providing a direct link between data and analysis tools, Galaxy allows addressing biological questions that are beyond the reach of existing software. Available both as (1) a publicly available web service providing tools for the analysis of genomic, comparative genomic and functional genomic data and (2) a downloadable package that can be deployed in individual labs, Galaxy attempts to serve both sides of the user distribution: experimental biologists and bioinformaticians. For experimental biologists, it provides an intuitive interface for data deposition and access, features a large number of tools and makes analysis transparent by documenting every step in the Galaxy history system. Most importantly, it streamlines the path from data to analysis, as even complex tools can be applied to genomic data directly without manual parsing or preprocessing. For bioinformaticians, Galaxy is a software system that provides informatics support through a platform that gives biologists simple interfaces to powerful tools, while automatically managing the computational details. Galaxy provides a framework that can integrate command-line tools with almost no effort. For each tool, Galaxy generates the interface and provides all computational housekeeping. A prime example of a remarkable disconnect between genomic data and analysis tools is in the case of multiple-species whole genome alignments. Continuingly expanding collections of freely downloadable multiple-species whole genome alignments have been made available to the scientific community, however, several issues exist which prevent experimental biologists from utilizing these important datasets. Simply put, these alignments are not only large enough to cause significant logistical problems just to download and store, but there are no tools available that allow command-line averse biologists to manipulate these alignments. Furthermore, current genome analysis packages, such as the phylogenetic software HyPhy, do not accept the Multiple Alignment Format (MAF) as input. A set of tools designed to address these challenges has been integrated into the Galaxy framework and is included as part of the standard software distribution. Short examples of tool usage as well as an in-depth sample analysis are presented along with descriptions of the individual tools. The step-by-step sample analysis and toolset integration provide real-life examples of the utility of Galaxy both as (1) an effective and intuitive analysis platform for experimental biologists and (2) a tool and data source integration framework for bioinformaticians.