In this era of Big Data Genomics, our goal at the GBSC is to provide best-in-class high performance computational infrastructure and cutting edge bioinformatics services for the Stanford community. We expect our services to provide ...
- Greater momentum for new labs
- Higher efficiency for established labs
- New levels of capability/capacity
- A level of infrastructure that cannot easily be built by a single lab or department
... resulting in increased velocity of scientific contribution.
Biomedical research is increasingly dealing with terabyte to petabyte scale data. To be able to do effective research given the scale of data implies our compute and storage needs to be affordable while providing us security, continuity and scalability. We have partnered with Stanford IT (Stanford Research Computing Center) and SoM IRT, to bring you the best and most affordable computational services in the form of reliable data centers, professional IT services and Cloud integration. Because of our scale, we can negotiate better prices on your behalf with hardware vendors and Cloud partners.
But our efforts don't stop there. As researchers ourselves, we found ourselves increasingly battling the complexity of security requirements from various IRBs and government institutions. So we went ahead and built a "do-once, use-many-times" secure infrastructure that fits common research requirements. We took our experiences and shared with the community in form of a peer reviewed publication.
Anyone who has compiled an analytical software along with its myriad dependencies, can appreciate that it is not a fun exercise. A typical bioinformatics pipelines requires 20-30 different software packages. If we were to sum up the hours every researcher would have to spend compiling all tools they need in routine analysis, we will lose a significant chunk of research time. So we built a central repository of tools for you. This repository is mature on the cluster and is in development on Cloud. The repository contains over 350 tools with many versions of each tool. While our research community is free to install tools in their personal workspace, they can also request the administrative team to install. Once installed in the central repository, it can be used by thousands of researchers.
Much like tools, reference datasets can participate in many many analysis. And challenges of a reference dataset is similar to tools. They require downloading, checking the file hashes, cleaning, normalizing and setting up appropriate access controls. In our vision, we will have all common datasets available for compute. We are only partly there now.
And finally, it is our community that makes us strong. We can't all be experts in everything. But if we have other experts to reach out to, it makes troubleshooting less painful and learning curve a lot less steep. We provide a forum for our researchers to ask questions of each other and support each other.
We have been calling this the vision for Data Commons. And indeed this vision is not unlike NIH Commons vision.