SCG Cluster

The SCG Cluster is an on-premises computing resource dedicated to -omics analyses, specifically designed for genomics research, which offers the hardware, software, and regulatory compliance required for all typical genomics analyses.

Hardware

Compute

Over 3,000 computational cores (CPUs) across over 70 servers are available to run user jobs.  These servers have a minimum of 384 GB of RAM and provide at least 16 GB per core to accommodate the high memory demands of genomic pipelines.  Several large memory servers are offered for data-intensive processing: 2 servers with 48 cores and 1.5 TB of RAM each, and 10 servers with 48 cores and 1 TB of RAM each.

Our cluster also includes a supercomputer for jobs which require both high memory and many simultaneous cores.  Our system has 360 cores, 10 terabytes of random-access-memory (RAM), 4 NVidia Pascal GPUs, and 150 terabytes of local scratch storage.

Jobs are managed via the Slurm job scheduler to maximize the efficient use of computer resources.  Interactive sessions are available for user-intensive work.

Dedicated data mover nodes are available within the cluster for moving large volumes of data. We support the Globus high-speed data transfer application for rapid file transfer across Stanford and across the world.  Stanford researchers also have access to Box and Google Drive for securely sharing files, each of which is certified for PHI.

Storage

More than 7 petabytes of high-speed storage are available in the SCG cluster, a capacity which can grow as demand requires.  This storage is provided by the Stanford Research Computing Center's Oak Storage Service.

Network and Data Center

All system devices are interconnected via at least a 10Gigabit Ethernet network, with links of up to 100 Gigabits between some system components.

The cluster is housed primarily in the Stanford Research Computing Facility (SRCF), a Tier 2 data center facility on campus .  “Tier 2” means that the data center has the backup systems to provide a minimum of 99% uptime, and that the center is also extremely energy-efficient as compared to similar facilities. Stanford’s data center offers a unified monitoring application which tracks data from temperature and under-floor pressure sensors, and from electrical and emergency system components. This telemetry, combined with around-the-clock staffing, results in very fast attention to system issues.

Software

Our cluster has all of the software needed for doing modern genomic analyses. Our staff has been diligent in installing over 800 public domain software packages, libraries, SDKs, and tools, such as:

  • BWA
  • Bowtie
  • GATK
  • SAMTools
  • Picard
     

This software repository also includes some commercial tools, such as the Sentieon suite, MATLAB, and Mathematica.  These software tools are available to all SCG cluster users.  We also have a team of bioinformaticians to assist with analyses.

Databases

KEGG

KEGG is a database resource for understanding high-level functions and utilities of the biological system, such as the cell, the organism and the ecosystem, from molecular-level information, especially large-scale molecular datasets generated by genome sequencing and other high-throughput experimental technologies.

For more information:

AlphaFold Protein Structure Database

The SCG Cluster has a mirror of the AlphaFold Protein Structure Database.  These data are made available by EMBL-EBI under a CC-BY-4.0 license.

For more information:

System Administration and Security Compliance

The system administration for the cluster is provided by a dedicated team of High Performance Computing (HPC) IT experts in Stanford’s Research Computing Center. The system administration includes, but is not limited, to:

  • User and group Linux system management

  • Slurm scheduler configuration management

  • Cluster OS stack management and upgrades

  • Usage and service monitoring

  • Hardware health upkeep

  • Software and operating system patches as needed
     

The cluster meets the dbGAP best-practice requirements as described in this document from the NIH.  The Stanford IT team runs periodic internal audits to ensure continued compliance with these guidelines.