DATA SHARING AND ANALYSIS
Our computing node for data analysis is maintained by Research Cyberinfrastructure within one of their several clusters located here at the University of South Carolina. This allows us to utilize their scientific analysis applications but also build and develop our own. Furthermore, High Performance Computing and data intensive computing are quickly converging, driven by the use of machine learning methodologies to extract meaning from big data. Therefore, RCI recognizes a need for local testbeds for experimentation, and is working to develop platforms that will combine these capabilities to provide hybrid environments that will enable large data exchanges over the network while offering high performance distributed computation.
Our computing node for data analysis is maintained by Research Cyberinfrastructure within one of their several clusters located here at the University of South Carolina. This allows us to utilize their scientific analysis applications but also build and develop our own. Furthermore, High Performance Computing and data intensive computing are quickly converging, driven by the use of machine learning methodologies to extract meaning from big data. Therefore, RCI recognizes a need for local testbeds for experimentation, and is working to develop platforms that will combine these capabilities to provide hybrid environments that will enable large data exchanges over the network while offering high performance distributed computation.
OPEN SOURCE TOOLS: Open source tools currently used by our group for data mining
GALAXY
Galaxy is an open, web-based platform for accessible, reproducible, and transparent computational biomedical research.
|
GeneWeaver
GeneWeaver combines cross-species data and gene entity integration, scalable hierarchical analysis of user data with a community-built and curated data archive of gene sets and gene networks, and tools for data driven comparison of user-defined biological, behavioral and disease concepts. Gene Weaver allows users to integrate gene sets across species, tissue and experimental platform. It differs from conventional gene set over-representation analysis tools in that it allows users to evaluate intersections among all combinations of a collection of gene sets, including, but not limited to annotations to controlled vocabularies. There are numerous applications of this approach. Sets can be stored, shared and compared privately, among user defined groups of investigators, and across all users. |
GeneMANIA
GeneMANIA finds other genes that are related to a set of input genes, using a very large set of functional association data. Association data include protein and genetic interactions, pathways, co-expression, co-localization and protein domain similarity. You can use GeneMANIA to find new members of a pathway or complex, find additional genes you may have missed in your screen or find new genes with a specific function, such as protein kinases. Your question is defined by the set of genes you input. |
Ensembl
The Ensembl project was started in 1999, with the aim of providing automated annotation of the human genome and making this publicly available via the web. Many more genomes have since been added to Ensembl and the range of available data has also expanded to include comparative genomics, variation and regulatory data. |
Haplo Reg v.2.0
HaploReg is a tool for exploring annotations of the noncoding genome at variants on haplotype blocks, such as candidate regulatory SNPs at disease-associated loci. Using LD information from the 1000 Genomes Project, linked SNPs and small indels can be visualized along with their predicted chromatin state, their sequence conservation across mammals, and their effect on regulatory motifs. HaploReg is designed for researchers developing mechanistic hypotheses of the impact of non-coding variants on clinical phenotypes and normal variation. |