Welcome to Viroverse
Viroverse is a platform for the collection, storage, retrieval, and analysis of experimental data for microbiology workflows. Developed in-house for ten years, it serves as the principal data store for HIV sequencing experiments conducted in the Mullins Lab. Viroverse currently houses tens of thousands of viral nucleotide sequences, together with comprehensive metadata about their creation including PCR protocols, gel images, subject clinical data, and more.
Dr. Mullins (project PI) has long been interested in a structured system to store the data generated and used in his lab. Evaluating commercial systems, he found that none met the needs of investigators looking to include experimental data for molecular biology togther with the results and analysis. Thus began, Viroverse.
The first step in building Viroverse was to develop a comprehensive database schema, storing viral sequences, subject data and additional information such as sequence annotations and alignments. Using the SeaPIP and MACS cohorts as prototypes of robust data, the Viroverse design team chose a level of abstraction in database representation that allows future values to be added easily and accommodates quality controls by flagging unexpected values. Experience adapting existing data from these two large pilot cohorts revealed the complexity of integrating even basic data from multiple sources. All values are stored in the most exact format possible so that precision is not sacrificed to the lowest common format.
We chose a highly normalized relational database structure specific to the molecular biology and attendant data of viral pathogens that allows us to enforce a degree of data standardization at the database level by using lookup tables of controlled vocabulary for reusable values and foreign key referential integrity. The dimensional approach typically favored in data warehousing architectures uses a relatively small number of fact tables to accommodate a wide variety of data. Using individual tables to represent the product of each process also allows association with unique pieces of data for each object in addition to generalized annotations that may apply across object classes.