Developing Tools for Life Science Data

BioCatalyst will be a first-of-its-kind search engine aiming to accelerate precision health by connecting clinically-annotated specimens and associated molecular data to existing inventory in a central ecosystem.

The concept of this platform is recognized globally now as one of the highest priorities in the life sciences, as no other system exists to date that allows individual labs to compliantly connect both clinical and molecular information against specimens. To date, the Stanford Biobank has successfully and securely integrated the two largest clinical systems used on-campus: REDCap and EPIC, into this system, and provides faculty with visualization and basic analytical tools. These pipelines are a first in the academic community, and the development of the molecular platform is underway.

Data Integrator

Build data collections directly from the user-interface, without needing to know any programming languages. Any authorized user can login into BioCatalyst and immediately begin connecting their sample inventory to data collected in REDCap projects or EPIC. The EPIC integration is especially unique, as the system will intelligently pull only the data permitted by the user’s approved IRB, thus ensuring compliance for each individual biobank.

Search & Share Data Collections

Once the datasets are connected, users can begin to filter and adjust the views across all available annotations. The views are coded, so they can be shared with other users or groups compliantly. Users can search within a single data collection or across multiple data collections, to find samples that match specific phenotypes. Because the data collections can be created and shared by any individual biobank, this promotes the utility of reviewing existing specimen inventories for downstream analysis.

Analyze and Visualize

All data collections can be pivoted into tables and other visualizations within the application, in order to quickly make inferences about specific trends and help review metrics. The data can also be exported to multiple statistical programs by use of an automated pipeline for data scientists; this mechanism allows for high-throughput computing on local machines, while also ensuring only authorized users are viewing the data.