Skip to content

Latest commit

 

History

History
52 lines (37 loc) · 1.83 KB

File metadata and controls

52 lines (37 loc) · 1.83 KB

Snowflake Architecture

Overview

Snowflake uses a hybrid of shared-disk and shared-nothing architecture. It called Multi cluster shared-data architecture.

  • Similar to shared-disk architectures:

    • Snowflake uses a central data repository for persisted data accessible from all compute nodes
  • Similar to shared-nothing architectures (e.g. Hadoop, Spark)

    • Snowflake processes queries using virtual warehouses
    • Use massive parallel processing compute clusters. Each node stores a portion of the data locally.

PROs

Data management simplicity (from shared-disk) and Performance scale-out benefits (from shared-nothing)

Snowflake Architecture Layers

Snowflake architecture consists of three distinct layers :

  1. Database storage
  2. Query processing (compute)
  3. Cloud services (brain)

Pasted image 20241223202315

Database storage - compressed columnar storage

  • data is stored in external cloud provider (aws, Azure, GCP)
  • data is stored compressed (blobs) & using AES-256 encryption
  • Snowflake manages all aspects about storage
  • optimized for OLAP / analytical purposes

Query processing (compute) - "Muscle of the system"

  • queries are processed using virtual warehouses
  • warehouse = MPP compute cluster (multiple compute nodes)
  • each virtual warehouse does not share compute resources with other virtual warehouses
  • provides resources : CPU, memoty and temporary storage

Cloud services - "Brain of the system"

  • collection of services to coordinate & manage the components
  • also run on compute instances of cloud provider

Services managed in this layer include:

  • Authentication
  • Infrastructure management
  • Metadata management
  • Query parsing and optimization
  • Access control