2022 - Deep Data Research Computing Center at Snyder Lab Uses AWS for Research in Precision Medicine Leveraging Multimodal Data
Highlighted Case Study in Amazon News
The Deep Data Research Computing Center (DDRCC) at Stanford University, one of the many initiatives originating out of Stanford Synder Labs, is part of the Department of Genetics at Stanford Medicine in Palo Alto, California. Its goal is to create tools that bridge the gap between biology and computer science, and help researchers in precision medicine deliver tangible medical solutions.
To facilitate precision medicine research, DDRCC created the My Personal Health Dashboard (MyPHD), a secure, scalable, and interoperable health management system for consumers. MyPHD provides efficient data acquisition, storage, and near-real-time analysis capabilities for researchers using Amazon Web Services (AWS). The team also developed the Stanford Data Ocean (SDO), which is the first serverless precision medicine educational solution for researchers to educate, innovate, and collaborate over code and data. By building on AWS, DDRCC is using the elasticity, scalability, and security of the cloud to benefit both consumers and biologists and improve the field of precision medicine.
Designing Solutions for Precision Medicine Research Using Multimodal Data
Precision medicine research relies on an individualized understanding of multimodal data (like genomic, microbiomic, and proteomic data) so that clinicians and researchers can personalize therapy for patients. The large amount of data derived from wearable sensors, electronic medical records, and molecular profiles adds another dimension. This increased scale and complexity raises new challenges around data availability, acquisition, storage, integration, and analysis. Therefore, it is imperative for researchers to have an agile and elastic data strategy. "Deep data is the future of medicine. We need it for monitoring health and for diagnostics, prognostics, and treatments, all at a personal level," says Dr. Michael Snyder, chair and professor of genetics at Stanford University.
DDRCC’s MyPHD provides a secure, comprehensive environment for biometrical data analytics at a massive scale. It can store, organize, and process complex health datasets and support near-real-time data analysis and visualization at the individual and cohort levels. This is designed to refine the accuracy of diagnoses and medical prescriptions, and improve precision medicine. To support the large-scale analysis of participants’ data for individual health management, DDRCC can scale resources for MyPHD based on the number of workloads. It also uses AWS security services as the foundation for its medical applications, which deal with large volumes of highly sensitive personal data.
Precision medicine depends on integrating disparate, multimodal datasets to draw inferences. Typically, these datasets are large and siloed across disparate sources. For researchers, it is important to determine the right compute and storage configurations that are needed to apply complex computational algorithms to these large datasets. The DDRCC team developed SDO to help researchers efficiently allocate resources to experiment with code. Using SDO, researchers can explore important questions around precision medicine and scale innovative solutions. By running SDO workloads on AWS, DDRCC has achieved high scalability while meeting stringent security requirements.
Building Innovative Solutions on AWS for Multimodal Data Analysis
To improve biologists’ ability to complete vital health research, DDRCC uses Amazon SageMaker and Service Workbench on AWS. Using SageMaker, bioinformaticians can build, train, and deploy machine learning models for virtually any use case with fully managed infrastructure, tools, and workflows. The team uses Service Workbench on AWS to facilitate the secure, repeatable, and federated control of access to data, tooling, and compute power that researchers need. Researchers can securely access large datasets on Amazon Simple Storage Service (Amazon S3), an object storage service with industry-leading scalability, data availability, security, and performance.
DDRCC requires high scalability to process data from MyPHD and SDO and relies on Amazon Elastic Compute Cloud (Amazon EC2), a web service that provides secure, resizable compute capacity in the cloud. “Not only can we scale MyPHD and support different numbers of users, but we can also scale our algorithms based on the number of workloads,” says Dr. Arash Alavi, research and development lead of the DDRCC at Stanford University. To run preprocessing pipelines for large-scale genomics and transcriptomics applications, the team also uses Amazon Genomics CLI, an open-source tool for genomics and life science customers, and AWS Batch, a service for fully managed batch processing at virtually any scale. Amazon Genomics CLI simplifies and automates cloud infrastructure deployments, while AWS Batch makes it simple to run hundreds of thousands of batch computing jobs on AWS.
DDRCC also uses Amazon Athena, an interactive query service, to facilitate the analysis of data stored in Amazon S3 using standard SQL. Because this service is highly elastic, researchers can query data collected by SDO and MyPHD on demand and move more quickly in their projects. Additionally, Athena is serverless, so there is no infrastructure for DDRCC to manage. The team pays for only the queries they run, reducing costs. “The ability to scale resources dynamically based on the size of the workload—this pay-as-you-go model—is astonishing,” says Dr. Amir Bahmani, director of the DDRCC at Stanford University.
Security is a major requirement for applications that handle medical data. DDRCC’s solutions do not use, store, or process protected health information, and all data in transit and at rest is completely encrypted and anonymized. To maintain a high level of security, DDRCC has adopted AWS services like Amazon Cognito, a service that lets teams add user sign-up, sign-in, and access control to web and mobile apps. “The security features that AWS provides include out-of-the-box logging, auditing, and monitoring, which we use to protect our data,” says Bahmani.
Collaborating on Precision Medicine
About Stanford Deep Data Research Computing Center
Stanford Deep Data Research Computing Center is in the Department of Genetics at Stanford Medicine in Palo Alto, California. The team works on design and development of systematic and intelligent solutions for large-scale biomedical applications.
Benefits of AWS
- Improves security of precision medicine solutions
- Achieves scalability of MyPHD for virtually any number of users
- Improves elasticity of the SDO for educational use
- Reduces costs with the pay-as-you-use model
- Improves adaptability for collaborative research