Snowflake uses a hybrid of shared-disk and shared-nothing architecture. It called Multi cluster shared-data architecture.
-
Similar to shared-disk architectures:
- Snowflake uses a central data repository for persisted data accessible from all compute nodes
-
Similar to shared-nothing architectures (e.g. Hadoop, Spark)
- Snowflake processes queries using virtual warehouses
- Use massive parallel processing compute clusters. Each node stores a portion of the data locally.
PROs
Data management simplicity (from shared-disk) and Performance scale-out benefits (from shared-nothing)
Snowflake architecture consists of three distinct layers :
- Database storage
- Query processing (compute)
- Cloud services (brain)
Database storage - compressed columnar storage
- data is stored in external cloud provider (aws, Azure, GCP)
- data is stored compressed (blobs) & using AES-256 encryption
- Snowflake manages all aspects about storage
- optimized for OLAP / analytical purposes
Query processing (compute) - "Muscle of the system"
- queries are processed using virtual warehouses
- warehouse = MPP compute cluster (multiple compute nodes)
- each virtual warehouse does not share compute resources with other virtual warehouses
- provides resources : CPU, memoty and temporary storage
Cloud services - "Brain of the system"
- collection of services to coordinate & manage the components
- also run on compute instances of cloud provider
Services managed in this layer include:
- Authentication
- Infrastructure management
- Metadata management
- Query parsing and optimization
- Access control
