Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 53 additions & 0 deletions .github/workflows/terraform.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Validate the Terraform module: fmt, validate, and unit tests on every PR.
# Tests use local fixtures only — no cloud credentials, no network.
name: terraform

on:
pull_request:
paths:
- "terraform/**"
- ".github/workflows/terraform.yml"
push:
branches: [main]
paths:
- "terraform/**"
- ".github/workflows/terraform.yml"

permissions:
contents: read

env:
TF_VERSION: "1.10.0"

jobs:
fmt:
name: terraform fmt
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
with:
terraform_version: ${{ env.TF_VERSION }}
- run: terraform -chdir=terraform fmt -check -recursive

validate:
name: terraform validate
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
with:
terraform_version: ${{ env.TF_VERSION }}
- run: terraform -chdir=terraform init -backend=false
- run: terraform -chdir=terraform validate

test:
name: terraform test
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
with:
terraform_version: ${{ env.TF_VERSION }}
- run: terraform -chdir=terraform init -backend=false
- run: terraform -chdir=terraform test
9 changes: 9 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,15 @@ coverage.xml
*.tmp
.cache/

# Terraform
.terraform/
.terraform.lock.hcl
*.tfstate
*.tfstate.*
*.tfplan
crash.log
crash.*.log

# md files
GITHUB-PAGES-SETUP.md
gh-commands.md
5 changes: 5 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ A Python utility that retrieves, processes, and organizes the official [Databric
- Creates individual text files per cloud and type (e.g. `aws.txt`, `azure-outbound.txt`, `gcp.txt`)
- **Per-region feeds** at `<cloud>-<region>.txt` (e.g. `aws-us-east-1.txt`, `azure-eastus.txt`) — emitted only when the region has ≥1 CIDR, so consumers can scope firewall rules to their actual workspace regions without parsing JSON
- Format compatible with **Palo Alto Networks (PA)** devices (one CIDR per line)
- **Terraform module** at [`terraform/`](terraform/) — exposes the per-region CIDR list as a sorted, deduplicated output you can wire into any TF resource (managed prefix list, IP group, storage account network rules, Cloud SQL authorized networks, etc.). No new compute infrastructure required.
- Maintains a history of JSON files
- Generates a user-friendly web interface to browse the data

Expand Down Expand Up @@ -84,6 +85,10 @@ For production-grade guidance on automating firewall rule updates across AWS, Az

**[→ Firewall Automation Guide](docs/firewall-automation-guide.md)**

For Terraform-heavy shops that prefer to wire CIDRs directly into their existing IaC (no Lambda/Function App needed), see:

**[→ Terraform Module](terraform/)**

For full CLI options, run `python extract-databricks-ips.py --help`.

## Disclaimer
Expand Down
42 changes: 13 additions & 29 deletions docs/firewall-automation-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -905,60 +905,44 @@ Reference `databricks-aws-ips` (or the relevant EDL) as the **Source** in your S

## GitOps / Terraform

For teams that require **PR-based approval** before production rule changes, or multi-cloud consistency from a single pipeline.
For teams that require **PR-based approval** before production rule changes, or multi-cloud consistency from a single pipeline. Use the published Terraform module — it owns the CIDR sourcing (per-region scoping, dedup, validation, fail-closed guards); you write the target resources in your own repo with whatever provider versions you already use.

```mermaid
flowchart LR
A["databricksIPranges repo\nWeekly GitHub Action\nupdates output files"] -->|webhook or scheduled poll| B["IaC Repo\nterraform/"]
B --> C["PR auto-created\nShows exact CIDR diff"]
A["databricksIPranges repo\nWeekly GitHub Action\nupdates per-region feeds"] -->|module reads via tag pin| B["Your IaC Repo\nterraform/"]
B --> C["PR shows exact CIDR diff\non bump of ?ref="]
C --> D{"Environment?"}
D -- dev/staging --> E["Auto-merge\nterraform apply"]
D -- prod --> F["Security team\napproves PR"]
F --> G["terraform apply\nProd"]
E --> H["AWS\nManaged Prefix List"]
G --> H
E --> I["Azure\nIP Group"]
E --> I["Azure\nIP Group + Storage Account"]
G --> I
E --> J["GCP\nFirewall Policy"]
E --> J["GCP\nFirewall Policy + Cloud SQL"]
G --> J
```

```hcl
# variables.tf
variable "cloud" { default = "aws" } # aws | azure | gcp

# data.tf — fetch pre-generated IP list from GitHub Pages at plan time
# Note: data "external" does NOT work here — it requires a flat JSON object,
# but the script returns an array. Use data "http" against the .txt file instead.
data "http" "databricks_ips" {
url = "https://bhavink.github.io/databricksIPranges/output/${var.cloud}.txt"
module "dbx_ips" {
source = "github.com/bhavink/databricksIPranges//terraform?ref=main" # pin to a tag in production
cloud = "aws"
regions = ["us-east-1"]
}

locals {
cidr_list = [
for line in split("\n", data.http.databricks_ips.response_body) :
trimspace(line)
if trimspace(line) != "" && !startswith(trimspace(line), "#")
]
}

# AWS — Managed Prefix List
resource "aws_ec2_managed_prefix_list" "databricks" {
name = "databricks-${var.cloud}"
name = "databricks-aws-us-east-1"
address_family = "IPv4"
max_entries = 200

dynamic "entry" {
for_each = toset(local.cidr_list)
content {
cidr = entry.value
description = "Databricks ${var.cloud}"
}
for_each = toset(module.dbx_ips.cidrs)
content { cidr = entry.value }
}
}
```

> Running `terraform plan` on a PR shows exactly which CIDRs were added or removed — reviewable, auditable, rollback = `git revert` + re-apply.
> Running `terraform plan` on a PR shows exactly which CIDRs were added or removed — reviewable, auditable, rollback = `git revert` + re-apply. Module fails closed on empty/corrupted feeds (won't silently clear your rules). Full inputs/outputs, per-cloud examples, and debugging are in [terraform/README.md](../terraform/README.md).

---

Expand Down
226 changes: 226 additions & 0 deletions terraform/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,226 @@
# Terraform Module — Databricks IP Ranges

A small, focused Terraform module that exposes Databricks CIDR ranges (per cloud, per region) as a sorted, deduplicated list. Wire the output into any TF resource you already write — managed prefix lists, IP groups, storage account network rules, Cloud SQL authorized networks, Cloud Armor policies, anything.

This module owns CIDR sourcing. It does **not** write target resources for you. That's deliberate — keeps the module ~50 lines, works with any provider version, and avoids carrying maintenance for N target types.

---

## Quickstart

```hcl
module "dbx_ips" {
source = "github.com/bhavink/databricksIPranges//terraform?ref=main"
cloud = "azure"
regions = ["eastus"]
}

resource "azurerm_storage_account_network_rules" "data" {
storage_account_id = azurerm_storage_account.data.id
default_action = "Deny"
bypass = ["AzureServices"]
ip_rules = module.dbx_ips.cidrs
}
```

> **Always pin `?ref=`** to a tag or commit SHA in production — see [Stability](#stability--pinning) below.

---

## Inputs

| Name | Type | Default | Description |
|---|---|---|---|
| `cloud` | string | _(required)_ | `aws`, `azure`, or `gcp` |
| `regions` | list(string) | `[]` | Region names. Empty = use all-cloud feed (broader, not recommended for production) |
| `source_base_url` | string | `https://bhavink.github.io/databricksIPranges/output` | Base URL serving the per-region `.txt` feeds. Override for forks or self-hosted mirrors |
| `source_files` | list(string) | `[]` | Local CIDR-per-line file paths. Non-empty = airgapped/vendored mode (no network) |
| `min_cidr_count` | number | `1` | Refuse to apply below this. Guards against feed-empty lockouts. Set `0` to disable |

## Outputs

| Name | Type | Description |
|---|---|---|
| `cidrs` | list(string) | Sorted, deduplicated CIDRs |
| `cidr_count` | number | `length(cidrs)` |
| `source` | list(string) | URLs or local file paths actually read |

---

## Examples

### AWS — Managed Prefix List (one region)

```hcl
module "dbx_ips" {
source = "github.com/bhavink/databricksIPranges//terraform?ref=main"
cloud = "aws"
regions = ["us-east-1"]
}

resource "aws_ec2_managed_prefix_list" "databricks" {
name = "databricks-aws-us-east-1"
address_family = "IPv4"
max_entries = 200

dynamic "entry" {
for_each = toset(module.dbx_ips.cidrs)
content { cidr = entry.value }
}
}
```

### Azure — IP Group + Storage Account (multi-region)

```hcl
module "dbx_ips" {
source = "github.com/bhavink/databricksIPranges//terraform?ref=main"
cloud = "azure"
regions = ["eastus", "westus2"]
}

resource "azurerm_ip_group" "databricks" {
name = "databricks-ip-ranges"
location = "eastus"
resource_group_name = azurerm_resource_group.network.name
cidrs = module.dbx_ips.cidrs
}

resource "azurerm_storage_account_network_rules" "data" {
storage_account_id = azurerm_storage_account.data.id
default_action = "Deny"
bypass = ["AzureServices"]
ip_rules = module.dbx_ips.cidrs # IP Groups can't be referenced here
}
```

### GCP — Cloud SQL authorized networks

```hcl
module "dbx_ips" {
source = "github.com/bhavink/databricksIPranges//terraform?ref=main"
cloud = "gcp"
regions = ["us-central1"]
}

resource "google_sql_database_instance" "this" {
name = "..."
database_version = "POSTGRES_16"
settings {
ip_configuration {
dynamic "authorized_networks" {
for_each = toset(module.dbx_ips.cidrs)
content {
name = "databricks-${replace(authorized_networks.value, "/", "-")}"
value = authorized_networks.value
}
}
}
}
}
```

### Airgapped — vendor the feed

Commit the per-region file into your own repo, point the module at the local path:

```hcl
module "dbx_ips" {
source = "github.com/bhavink/databricksIPranges//terraform?ref=main"
cloud = "azure"
source_files = ["${path.module}/vendored/azure-eastus.txt"]
}
```

A periodic job in your repo (e.g. Renovate, a scheduled GH Action) updates the vendored file via PR. Your TF apply only sees changes when that PR merges.

---

## Stability — pinning

| Strategy | `source` | When CIDRs change |
|---|---|---|
| **Tag** _(recommended)_ | `?ref=v2026.05.05` | When you bump the tag |
| **Commit SHA** _(strictest)_ | `?ref=a1b2c3d` | When you bump the SHA |
| **Branch** _(don't)_ | `?ref=main` | Every plan re-resolves — risk of unreviewed CIDR changes |

**Why pin:** Without it, every `terraform plan` re-resolves `main` and could surface CIDR diffs you haven't reviewed. Pinning makes the bump an explicit PR in your repo.

---

## Debugging

The module emits diagnostic outputs every run:

```bash
terraform output cidr_count # how many CIDRs landed
terraform output source # URLs or file paths actually read
terraform output cidrs | head # spot-check first few
```

### Common errors

| Error | Cause | Fix |
|---|---|---|
| `Failed to fetch ... — HTTP 404` | Wrong region name or wrong cloud | Check `<source_base_url>/` for valid feeds |
| `Resolved 0 CIDRs ... need at least 1` | Empty/missing feed, typo'd region | Verify region name; or set `min_cidr_count = 0` if intentional |
| `Feed contained non-CIDR lines` | URL serves HTML/JSON, not text | Verify `source_base_url` points at the `output/` directory, not the JSON endpoint |
| `cloud must be one of: aws, azure, gcp` | Typo on `cloud` input | Use lowercase, exact match |
| `regions must contain only lowercase letters, digits, and hyphens` | Region has spaces, uppercase, or other chars | Use the exact region name from `<source_base_url>/` |

### Deeper diagnostics

```bash
TF_LOG=DEBUG terraform plan
```

Use this only for provider-level issues (TLS errors, proxy/DNS, IPv6 routing). Most user-facing errors are caught by validation/precondition messages above.

---

## Testing

Local:

```bash
cd terraform
terraform fmt -check -recursive
terraform init -backend=false
terraform validate
terraform test
```

CI runs the same on every PR touching `terraform/` — see `.github/workflows/terraform.yml`.

Coverage:

| Behaviour | Test |
|---|---|
| Single-file happy path | `happy_path_single_file` |
| Multi-file union | `multi_file_union` |
| Comment + blank line stripping | `strips_comments_and_blanks` |
| Deduplication | `deduplicates` |
| Cloud input validation | `rejects_invalid_cloud` |
| Region format validation | `rejects_invalid_region_format` |
| Lockout guard (`min_cidr_count`) | `rejects_below_min_cidr_count` |
| Lockout guard disabled | `min_cidr_count_zero_allows_empty` |
| Non-CIDR content detection | `rejects_non_cidr_content` |

Tests use `source_files` against committed fixtures — no network required, runs in seconds.

---

## What this module deliberately does NOT do

- **Write target resources for you.** You write `aws_ec2_managed_prefix_list`, `azurerm_storage_account_network_rules`, etc. — that's where provider-specific limits and quirks live (rule caps, IPv4-only constraints, naming rules). Examples above show the patterns.
- **Validate cloud-provider caps** (AWS prefix list 200 entries, Azure storage account 400 IPs, etc.). Your resource block is the right place to fail on those.
- **Filter inbound vs outbound.** The published feeds already combine both. Use `source_files` against `<cloud>-<region>-inbound.txt` / `-outbound.txt` from your own fork if you need split feeds.
- **Refresh CIDRs automatically.** Pin a ref. Bump it via PR when you want to update.

---

## Stability guarantees

- Inputs and outputs are stable. New optional inputs may be added; existing inputs and output shapes will not change without a major version bump.
- The published feed format is `<cidr>\n<cidr>\n` (one CIDR per line, optional `#` comments and blank lines tolerated). Changing this is a breaking change for any consumer, not just this module — it would not be done lightly.
- The module fails closed: empty feed, non-CIDR content, or fetch failure all halt the apply rather than silently emitting garbage downstream.
Loading
Loading