Skip to content

Salkimmich/workload_identity

Workload Identity Project

Welcome to the Workload Identity project! This repository contains a comprehensive set of documentation and guides to help you understand, implement, and manage workload identity in cloud-native environments.

Documentation Overview

Below is a list of the key README files and guides available in this repository:

  1. Architecture Guide: A detailed overview of the system architecture, components, and their interactions. This guide is essential for understanding the design principles and implementation details of the workload identity system.

  2. Developer Guide: Instructions and best practices for developers looking to integrate or extend the workload identity system. This guide includes API references, implementation tips, and code examples.

  3. Compliance Guide: Guidelines and requirements for ensuring compliance with security standards and regulations. This guide covers metrics, thresholds, and automated compliance checks.

  4. Deployment Guide: Step-by-step instructions for deploying the workload identity system in various environments, including Kubernetes and cloud providers.

  5. Monitoring Guide: Information on how to monitor the workload identity system, including metrics collection, alerting, and performance optimization.

  6. Security Best Practices: Best practices for securing the workload identity system, including encryption, access control, and audit logging.

  7. API Reference Guide: Detailed documentation of the APIs provided by the workload identity system, including endpoints, parameters, and response formats.

  8. Migration Guide: Instructions for migrating from existing identity systems to the workload identity system, including compatibility considerations and migration strategies.

  9. Troubleshooting Guide: Common issues and solutions for troubleshooting the workload identity system, including diagnostic tools and debugging tips.

  10. Integration Guide: Guidelines for integrating the workload identity system with other services and platforms, including cloud providers and service meshes.

  11. Agentic Identity Guide: Workload identity for AI agents and autonomous systems. SPIFFE ID patterns for agent roles, JIT access provisioning, MCP gateway enforcement, and migration from API key authentication.

  12. Post-Quantum Cryptography Migration Guide: Migration planning for NIST's 2024 PQC standards. Hybrid deployment architecture and compliance implications under CRA, EU AI Act, NIS2, and DORA.

  13. CI/CD OIDC Federation Guide: Secretless CI/CD with GitHub Actions and GitLab. Working configurations for AWS, GCP, and Azure.

  14. TPM 2.0 Node Attestation: Hardware-rooted attestation for physical infrastructure, edge, and air-gapped environments.

  15. Resources: Curated references organized by use case.

Getting Started

To get started with the Workload Identity project, please refer to the Architecture Guide for an overview of the system design and the Developer Guide for implementation details.

Contributing

We welcome contributions! Please see the Contributing Guide for more information on how to contribute to this project.

License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

Contact

For any questions or feedback, please contact the maintainer at salkimmich.

Workload Identity System

Introduction

This repository implements a comprehensive workload identity system designed for modern cloud-native environments. It provides a robust foundation for secure service-to-service communication, identity management, and access control in distributed systems. The system is built on the principles of zero-trust security, enabling organizations to implement strong authentication and authorization mechanisms across their infrastructure.

Why Workload Identity?

In today's cloud-native architectures, identity is no longer a human-first problem. By 2025, enterprise environments averaged 45 non-human identities for every human identity - a ratio growing fast as AI agents, CI/CD pipelines, serverless functions, and automated workflows take on more operational work.

Workloads must authenticate and authorize dynamically, with provable integrity and no static secrets. Workload identity assigns each non-human actor a unique, verifiable identifier (a SPIFFE ID), enabling secure, policy-driven communication at scale.

For AI agents specifically, see the Agentic Identity Guide.

SPIRE Implementation

The project uses SPIRE (SPIFFE Runtime Environment) as its core workload identity provider. The implementation includes:

Core Components

  1. SPIRE Server

    • Deployed as a Kubernetes Deployment
    • High availability configuration with rolling updates
    • Comprehensive health checks and monitoring
    • Secure secret management
    • Resource limits and requests configured
    • Security context with non-root user execution
  2. SPIRE Agent

    • Deployed as a Kubernetes DaemonSet
    • Rolling update strategy for controlled updates
    • Node attestation and workload registration
    • Health checks and monitoring
    • Resource limits and requests configured
    • Security context with non-root user execution
  3. Certificate Management

    • Automated certificate generation and rotation
    • CRL (Certificate Revocation List) support
    • Backup and disaster recovery procedures
    • Secure storage of certificates and keys

Security Features

  1. Certificate Security

    • Strong key usage constraints
    • Extended key usage validation
    • Certificate revocation support
    • Automated backup of certificates
  2. Runtime Security

    • Non-root user execution
    • Read-only root filesystem
    • Dropped capabilities
    • Prevented privilege escalation
  3. Operational Security

    • Comprehensive health checks
    • Resource limits and requests
    • Graceful termination
    • Secure volume mounts

Monitoring and Operations

  1. Health Checks

    • Startup probes for slow-starting containers
    • Liveness probes for container health
    • Readiness probes for service availability
    • Configurable timeouts and thresholds
  2. Resource Management

    • CPU and memory limits
    • Resource requests for scheduling
    • Graceful termination periods
    • Update strategies for zero-downtime
  3. Backup and Recovery

    • Automated certificate backups
    • Timestamped backup directories
    • Secure backup storage
    • Cleanup procedures

Documentation

Core Documentation

Migration and Compliance

Repository Structure

.
├── docs/                      # Documentation
│   ├── architecture_guide.md  # System architecture and design
│   ├── security_best_practices.md # Security guidelines
│   ├── developer_guide.md    # Development workflows
│   ├── deployment_guide.md   # Deployment procedures
│   ├── monitoring_guide.md   # Monitoring and observability
│   ├── disaster_recovery_guide.md # Recovery procedures
│   ├── api_reference.md      # API documentation
│   ├── migration_guide.md    # Migration procedures
│   ├── compliance_guide.md   # Compliance requirements
│   ├── troubleshooting_guide.md # Common issues
│   ├── pki_guide.md         # PKI concepts and usage
│   └── pki_concepts_detailed.md # Detailed PKI documentation
├── tests/                    # Test infrastructure
│   ├── unit/                # Unit tests
│   ├── integration/         # Integration tests
│   ├── e2e/                # End-to-end tests
│   ├── security/           # Security tests
│   └── fixtures/           # Test fixtures and mocks
├── infrastructure/           # Infrastructure as Code
│   └── kubernetes/          # Kubernetes manifests
│       ├── spire/          # SPIRE server and agent configs
│       ├── monitoring/     # Monitoring stack configs
│       └── networking/     # Network policies and configs
├── core/                   # Core libraries and utilities
│   ├── identity/          # Identity management
│   ├── security/          # Security utilities
│   └── integration/       # Integration components
├── services/              # Service implementations
│   ├── identity-provider/ # Identity provider service
│   ├── policy-engine/     # Policy evaluation service
│   └── federation/        # Federation service
└── examples/              # Example applications
    ├── kubernetes/        # Kubernetes examples
    ├── cloud/            # Cloud provider examples
    └── service-mesh/     # Service mesh examples

Key Directories Explained

Documentation (docs/)

The documentation directory contains comprehensive guides covering all aspects of the system:

  • Architecture and design decisions
  • Security best practices and guidelines
  • Development workflows and standards
  • Deployment procedures and configurations
  • Monitoring and observability setup
  • Disaster recovery procedures
  • API reference and usage examples
  • Migration procedures from existing systems
  • Compliance requirements and implementation
  • Troubleshooting guides
  • PKI concepts and detailed documentation

Testing (tests/)

The testing directory contains a comprehensive test suite:

  • Unit tests for individual components
  • Integration tests for component interactions
  • End-to-end tests for complete workflows
  • Security tests including penetration and fuzzing tests
  • Test fixtures and mocks for various components

Key features:

  • Cloud provider test fixtures (AWS, GCP, Azure)
  • Kubernetes test configurations
  • Certificate generation and management
  • Mock implementations for external services

Infrastructure (infrastructure/)

The infrastructure directory contains all infrastructure-as-code configurations:

  • Kubernetes manifests for SPIRE deployment
  • Monitoring stack configurations
  • Network policies and security configurations
  • (Planned) Terraform configurations for cloud resources
  • (Planned) Ansible playbooks for automation

Core Libraries (core/)

The core directory contains the fundamental libraries and utilities:

  • Identity management components
  • Security utilities and helpers
  • Integration components for various platforms
  • Common utilities and shared code

Services (services/)

The services directory contains the main service implementations:

  • Identity provider service for workload identity
  • Policy engine for access control
  • Federation service for cross-domain trust
  • Additional supporting services

Examples (examples/)

The examples directory contains practical examples and templates:

  • Kubernetes deployment examples
  • Cloud provider integration examples
  • Service mesh integration examples
  • Common use case implementations

Key Features

Identity Management

The system provides robust identity management capabilities through a distributed architecture that ensures secure and scalable workload identity provisioning. Each workload receives a unique SPIFFE ID that is cryptographically verifiable and tied to its runtime environment.

Special Considerations:

  • Ephemeral workloads require identity at runtime—not via pre-provisioned secrets
  • Trust is anchored in secure enclaves, TPMs, or cloud-native CAs
  • The system is designed for rapid scaling and horizontal expansion
  • Supports both X.509 certificates and JWTs
  • Enables decentralized issuance with centralized governance
  • Federation-ready: supports trust bundles and multi-domain identity
graph TD
    A[Workload] -->|Attestation| B[SPIRE Agent]
    B -->|Identity Request| C[SPIRE Server]
    C -->|Validate| D[Node Attestation]
    C -->|Issue| E[SPIFFE ID]
    E -->|Bind| F[Workload Identity]
    F -->|Use| G[Service Communication]
Loading

Key components:

  • Automatic identity provisioning based on workload attributes
  • Certificate-based authentication using X.509 certificates
  • Role-based access control with fine-grained permissions
  • Support for multiple identity providers through federation

Security

The security architecture implements a zero-trust model where every service interaction requires mutual authentication and authorization. The system uses mTLS for secure communication and implements robust key management practices.

Security Features:

  • End-to-end mTLS with SPIFFE-based identities
  • Continuous credential rotation (5–15 minute TTLs)
  • Just-in-time identity provisioning and revocation
  • Optional confidential computing integration (e.g. Intel SGX, AMD SEV)
  • Real-time attestation: validate integrity of code and runtime environment

Temporal Governance:

  • Trust is not just who a workload is—but when it is valid
  • Time-bound credentials ensure credentials expire quickly, reducing lateral movement risk
  • Policies can enforce access windows and runtime conditions via engines like OPA
graph LR
    A[Service A] -->|mTLS| B[Service B]
    A -->|Verify| C[Certificate]
    B -->|Verify| D[Certificate]
    C -->|Validate| E[Trust Chain]
    D -->|Validate| E
    E -->|Check| F[Policy Engine]
Loading

Security features:

  • Mutual TLS (mTLS) for all service-to-service communication
  • Zero-trust model implementation with continuous verification
  • Secure key management with automatic rotation
  • Hardware security module (HSM) support for key storage

Integration

The system provides seamless integration with modern cloud-native platforms and tools, enabling organizations to implement workload identity across their entire infrastructure stack.

Integration capabilities:

  • Kubernetes: Uses native ServiceAccount tokens for OIDC federation
  • Cloud IAM: Compatible with AWS IRSA, GCP Workload Identity, Azure Federated Identity Credentials
  • Service Mesh: Works with Istio, Linkerd, and custom mTLS setups
  • CI/CD: Secure ephemeral identity for GitHub Actions, GitLab, Jenkins, and more
  • Secrets Managers: Authenticate to systems like Vault using ambient SPIFFE identity
graph TD
    A[Workload Identity] -->|Integrate| B[Kubernetes]
    A -->|Connect| C[Service Mesh]
    A -->|Federate| D[Cloud Providers]
    A -->|Automate| E[CI/CD]
    B -->|Use| F[Service Accounts]
    C -->|Use| G[mTLS]
    D -->|Use| H[Cloud IAM]
    E -->|Use| I[Pipeline Security]
Loading

Getting Started

Prerequisites

  • Kubernetes cluster (v1.24+)
  • Helm v3.7+
  • kubectl
  • SPIRE v1.13.3+ (see SPIRE releases)
  • Access to container registry

Quick Start

  1. Review the Architecture Guide
  2. Follow the Deployment Guide
  3. Consult the Developer Guide
  4. Refer to the Security Best Practices

Development Workflow

1. Local Development Setup

# Clone the repository
git clone https://github.com/your-org/workload-identity.git
cd workload-identity

# Set up development environment
make setup-dev

# Start local development cluster
make start-local-cluster

# Run tests
make test

2. Component Development

The system is organized into several key components:

  1. Identity Provider

    • Located in services/identity-provider/
    • Handles workload identity issuance and validation
    • Implements SPIFFE/SPIRE integration
    • Manages trust relationships
  2. Policy Engine

    • Located in services/policy-engine/
    • Evaluates access control policies
    • Integrates with OPA for policy decisions
    • Manages policy lifecycle
  3. Federation Service

    • Located in services/federation/
    • Handles cross-domain trust
    • Manages trust bundle exchange
    • Implements federation protocols

3. Testing Strategy

  • Unit tests for core components
  • Integration tests for service interactions
  • End-to-end tests for complete workflows
  • Security tests for trust relationships
  • Performance tests for scalability

4. Deployment Process

  1. Build components:

    make build
  2. Run tests:

    make test
  3. Build containers:

    make docker-build
  4. Deploy to cluster:

    make deploy

5. Monitoring and Debugging

  • Use the monitoring stack for observability
  • Check logs using the logging system
  • Use tracing for request flows
  • Monitor metrics for performance

6. Contributing

  1. Create a feature branch
  2. Make your changes
  3. Run tests and linting
  4. Submit a pull request
  5. Address review comments
  6. Merge after approval

For more details, see the Developer Guide.

Basic Usage Example

apiVersion: workload-identity/v1
kind: WorkloadIdentity
metadata:
  name: my-service
spec:
  serviceAccount: my-service-account
  identityProvider: kubernetes
  policies:
    - name: service-access
      rules:
        - apiGroups: ["*"]
          resources: ["*"]
          verbs: ["get", "list"]

Migration

The system supports migration from various identity systems:

  • OIDC to Workload Identity
  • SAML to Workload Identity
  • Custom identity system migration

See the Migration Guide for detailed procedures.

Compliance

Supports major compliance frameworks:

  • ISO 27001
  • SOC 2
  • GDPR
  • HIPAA

See the Compliance Guide for implementation details.

Monitoring and Observability

The system provides comprehensive monitoring and observability capabilities through a modern observability stack:

Monitoring Stack

monitoring:
  metrics:
    prometheus:
      enabled: true
      retention: 15d
      scrape_interval: 15s
  logging:
    loki:
      enabled: true
      retention: 30d
  tracing:
    jaeger:
      enabled: true
      sampling_rate: 0.1

Key Metrics

  1. Identity Metrics

    • Identity issuance rate
    • Identity validation success/failure
    • Certificate rotation events
    • Trust bundle updates
  2. Security Metrics

    • Authentication attempts
    • Authorization decisions
    • Policy evaluation latency
    • Security violations
  3. Performance Metrics

    • Request latency
    • Error rates
    • Resource utilization
    • Cache hit rates
  4. Federation Metrics

    • Cross-domain trust operations
    • Trust bundle exchange events
    • Federation health status
    • Federation latency

Logging

  • Structured logging in JSON format
  • Log levels: DEBUG, INFO, WARN, ERROR
  • Log rotation and retention
  • Log aggregation and analysis

Tracing

  • Distributed tracing with OpenTelemetry
  • Request flow visualization
  • Latency analysis
  • Error tracking

Dashboards

  1. System Overview

    • System health
    • Resource utilization
    • Error rates
    • Performance metrics
  2. Security Dashboard

    • Authentication metrics
    • Authorization decisions
    • Security events
    • Policy evaluations
  3. Federation Dashboard

    • Trust relationships
    • Federation health
    • Cross-domain operations
    • Trust bundle status

Alerting

  1. Critical Alerts

    • System failures
    • Security violations
    • Trust chain issues
    • Federation failures
  2. Warning Alerts

    • High latency
    • Error rate spikes
    • Resource constraints
    • Certificate expiration
  3. Info Alerts

    • Configuration changes
    • Policy updates
    • Trust bundle updates
    • Federation events

For detailed setup and configuration, see the Monitoring Guide.

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Acknowledgments

  • SPIFFE/SPIRE community
  • Open source contributors
  • Security researchers
  • Early adopters and testers

Future Considerations

Workload identity is the foundation of runtime trust for autonomous systems. As software shifts from static services to AI agents that reason, delegate, and act across trust domains, the primitives in this repository become more important, not less.

The Agentic Identity Guide covers how to extend this architecture to cover AI agents today. The PQC Migration Guide covers preparing this infrastructure for the post-quantum transition NIST formalized in 2024.

These are not future problems. They are current engineering requirements.

Senior Engineer Review Guide

Critical Files to Review

1. Core Implementation Files

  • core/identity/manager.go - Core identity management logic
  • core/security/crypto.go - Cryptographic operations and key management
  • core/integration/kubernetes.go - Kubernetes integration implementation
  • services/identity-provider/main.go - Identity provider service implementation
  • services/policy-engine/evaluator.go - Policy evaluation engine
  • infrastructure/kubernetes/spire/server.yaml - SPIRE server configuration
  • infrastructure/kubernetes/spire/agent.yaml - SPIRE agent configuration

2. Security-Critical Files

  • core/security/tls.go - TLS configuration and certificate management
  • core/security/auth.go - Authentication and authorization logic
  • core/security/audit.go - Audit logging implementation
  • infrastructure/kubernetes/spire/secrets.yaml - Secret management
  • infrastructure/kubernetes/spire/network-policy.yaml - Network security policies

3. Configuration Files

  • infrastructure/kubernetes/spire/configmap.yaml - Core configuration
  • infrastructure/kubernetes/spire/trust-bundle.yaml - Trust domain configuration
  • infrastructure/kubernetes/spire/federation.yaml - Federation settings
  • infrastructure/kubernetes/monitoring/prometheus.yaml - Monitoring configuration

Essential Documentation Review

1. Architecture and Design

2. Operational Documentation

Review Checklist

1. Security Architecture

  • Zero Trust principles implementation
  • Cryptographic implementation review
  • Authentication and authorization flows
  • Secret management approach
  • Network security controls
  • Audit logging implementation

2. Scalability and Performance

  • Horizontal scaling capabilities
  • Performance bottlenecks
  • Resource utilization
  • Caching strategies
  • Rate limiting implementation

3. Reliability and Resilience

  • High availability design
  • Fault tolerance mechanisms
  • Disaster recovery procedures
  • Backup and restore capabilities
  • State management approach

4. Integration Points

  • Kubernetes integration
  • Service mesh compatibility
  • Cloud provider integration
  • External identity providers
  • Monitoring and logging systems

5. Operational Considerations

  • Deployment strategies
  • Configuration management
  • Monitoring and alerting
  • Logging and tracing
  • Maintenance procedures

Key Areas of Focus

1. Security Implementation

  • Review cryptographic implementations for best practices
  • Verify secure key management and rotation
  • Check authentication and authorization flows
  • Validate audit logging and monitoring
  • Assess network security controls

2. Architecture Design

  • Evaluate component interactions
  • Review scalability considerations
  • Check fault tolerance mechanisms
  • Verify high availability design
  • Assess integration patterns

3. Operational Excellence

  • Review deployment procedures
  • Check monitoring implementation
  • Verify backup and recovery
  • Assess maintenance procedures
  • Validate configuration management

4. Compliance and Standards

  • Review security standards compliance
  • Check audit requirements
  • Verify documentation completeness
  • Assess policy implementation
  • Validate control effectiveness

Post-Review Actions

1. Documentation Updates

  • Update architecture documentation
  • Enhance security guidelines
  • Improve operational procedures
  • Add troubleshooting guides
  • Update compliance documentation

2. Implementation Improvements

  • Address security findings
  • Enhance scalability features
  • Improve monitoring capabilities
  • Strengthen disaster recovery
  • Optimize performance

3. Operational Enhancements

  • Improve deployment procedures
  • Enhance monitoring setup
  • Strengthen backup procedures
  • Optimize maintenance tasks
  • Update security controls

Additional Resources

1. Standards and Best Practices

2. Reference Implementations

3. Tools and Utilities

About

Notes on the basics of Workload Identity with SPIFFE and SPIRE

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors