Welcome to the Workload Identity project! This repository contains a comprehensive set of documentation and guides to help you understand, implement, and manage workload identity in cloud-native environments.
Below is a list of the key README files and guides available in this repository:
-
Architecture Guide: A detailed overview of the system architecture, components, and their interactions. This guide is essential for understanding the design principles and implementation details of the workload identity system.
-
Developer Guide: Instructions and best practices for developers looking to integrate or extend the workload identity system. This guide includes API references, implementation tips, and code examples.
-
Compliance Guide: Guidelines and requirements for ensuring compliance with security standards and regulations. This guide covers metrics, thresholds, and automated compliance checks.
-
Deployment Guide: Step-by-step instructions for deploying the workload identity system in various environments, including Kubernetes and cloud providers.
-
Monitoring Guide: Information on how to monitor the workload identity system, including metrics collection, alerting, and performance optimization.
-
Security Best Practices: Best practices for securing the workload identity system, including encryption, access control, and audit logging.
-
API Reference Guide: Detailed documentation of the APIs provided by the workload identity system, including endpoints, parameters, and response formats.
-
Migration Guide: Instructions for migrating from existing identity systems to the workload identity system, including compatibility considerations and migration strategies.
-
Troubleshooting Guide: Common issues and solutions for troubleshooting the workload identity system, including diagnostic tools and debugging tips.
-
Integration Guide: Guidelines for integrating the workload identity system with other services and platforms, including cloud providers and service meshes.
-
Agentic Identity Guide: Workload identity for AI agents and autonomous systems. SPIFFE ID patterns for agent roles, JIT access provisioning, MCP gateway enforcement, and migration from API key authentication.
-
Post-Quantum Cryptography Migration Guide: Migration planning for NIST's 2024 PQC standards. Hybrid deployment architecture and compliance implications under CRA, EU AI Act, NIS2, and DORA.
-
CI/CD OIDC Federation Guide: Secretless CI/CD with GitHub Actions and GitLab. Working configurations for AWS, GCP, and Azure.
-
TPM 2.0 Node Attestation: Hardware-rooted attestation for physical infrastructure, edge, and air-gapped environments.
-
Resources: Curated references organized by use case.
To get started with the Workload Identity project, please refer to the Architecture Guide for an overview of the system design and the Developer Guide for implementation details.
We welcome contributions! Please see the Contributing Guide for more information on how to contribute to this project.
This project is licensed under the Apache 2.0 License - see the LICENSE file for details.
For any questions or feedback, please contact the maintainer at salkimmich.
This repository implements a comprehensive workload identity system designed for modern cloud-native environments. It provides a robust foundation for secure service-to-service communication, identity management, and access control in distributed systems. The system is built on the principles of zero-trust security, enabling organizations to implement strong authentication and authorization mechanisms across their infrastructure.
In today's cloud-native architectures, identity is no longer a human-first problem. By 2025, enterprise environments averaged 45 non-human identities for every human identity - a ratio growing fast as AI agents, CI/CD pipelines, serverless functions, and automated workflows take on more operational work.
Workloads must authenticate and authorize dynamically, with provable integrity and no static secrets. Workload identity assigns each non-human actor a unique, verifiable identifier (a SPIFFE ID), enabling secure, policy-driven communication at scale.
For AI agents specifically, see the Agentic Identity Guide.
The project uses SPIRE (SPIFFE Runtime Environment) as its core workload identity provider. The implementation includes:
-
SPIRE Server
- Deployed as a Kubernetes Deployment
- High availability configuration with rolling updates
- Comprehensive health checks and monitoring
- Secure secret management
- Resource limits and requests configured
- Security context with non-root user execution
-
SPIRE Agent
- Deployed as a Kubernetes DaemonSet
- Rolling update strategy for controlled updates
- Node attestation and workload registration
- Health checks and monitoring
- Resource limits and requests configured
- Security context with non-root user execution
-
Certificate Management
- Automated certificate generation and rotation
- CRL (Certificate Revocation List) support
- Backup and disaster recovery procedures
- Secure storage of certificates and keys
-
Certificate Security
- Strong key usage constraints
- Extended key usage validation
- Certificate revocation support
- Automated backup of certificates
-
Runtime Security
- Non-root user execution
- Read-only root filesystem
- Dropped capabilities
- Prevented privilege escalation
-
Operational Security
- Comprehensive health checks
- Resource limits and requests
- Graceful termination
- Secure volume mounts
-
Health Checks
- Startup probes for slow-starting containers
- Liveness probes for container health
- Readiness probes for service availability
- Configurable timeouts and thresholds
-
Resource Management
- CPU and memory limits
- Resource requests for scheduling
- Graceful termination periods
- Update strategies for zero-downtime
-
Backup and Recovery
- Automated certificate backups
- Timestamped backup directories
- Secure backup storage
- Cleanup procedures
- Architecture Guide - System architecture and component interactions
- Security Best Practices - Security guidelines and implementation details
- Developer Guide - Development workflows and best practices
- Deployment Guide - Deployment procedures and configurations
- Monitoring Guide - Monitoring setup and observability practices
- Disaster Recovery Guide - Recovery procedures and backup strategies
- API Reference Guide - API documentation and usage examples
- Migration Guide - Migration procedures from existing systems
- Compliance Guide - Compliance requirements and implementation details
- Troubleshooting Guide - Common issues and solutions
.
├── docs/ # Documentation
│ ├── architecture_guide.md # System architecture and design
│ ├── security_best_practices.md # Security guidelines
│ ├── developer_guide.md # Development workflows
│ ├── deployment_guide.md # Deployment procedures
│ ├── monitoring_guide.md # Monitoring and observability
│ ├── disaster_recovery_guide.md # Recovery procedures
│ ├── api_reference.md # API documentation
│ ├── migration_guide.md # Migration procedures
│ ├── compliance_guide.md # Compliance requirements
│ ├── troubleshooting_guide.md # Common issues
│ ├── pki_guide.md # PKI concepts and usage
│ └── pki_concepts_detailed.md # Detailed PKI documentation
├── tests/ # Test infrastructure
│ ├── unit/ # Unit tests
│ ├── integration/ # Integration tests
│ ├── e2e/ # End-to-end tests
│ ├── security/ # Security tests
│ └── fixtures/ # Test fixtures and mocks
├── infrastructure/ # Infrastructure as Code
│ └── kubernetes/ # Kubernetes manifests
│ ├── spire/ # SPIRE server and agent configs
│ ├── monitoring/ # Monitoring stack configs
│ └── networking/ # Network policies and configs
├── core/ # Core libraries and utilities
│ ├── identity/ # Identity management
│ ├── security/ # Security utilities
│ └── integration/ # Integration components
├── services/ # Service implementations
│ ├── identity-provider/ # Identity provider service
│ ├── policy-engine/ # Policy evaluation service
│ └── federation/ # Federation service
└── examples/ # Example applications
├── kubernetes/ # Kubernetes examples
├── cloud/ # Cloud provider examples
└── service-mesh/ # Service mesh examples
The documentation directory contains comprehensive guides covering all aspects of the system:
- Architecture and design decisions
- Security best practices and guidelines
- Development workflows and standards
- Deployment procedures and configurations
- Monitoring and observability setup
- Disaster recovery procedures
- API reference and usage examples
- Migration procedures from existing systems
- Compliance requirements and implementation
- Troubleshooting guides
- PKI concepts and detailed documentation
The testing directory contains a comprehensive test suite:
- Unit tests for individual components
- Integration tests for component interactions
- End-to-end tests for complete workflows
- Security tests including penetration and fuzzing tests
- Test fixtures and mocks for various components
Key features:
- Cloud provider test fixtures (AWS, GCP, Azure)
- Kubernetes test configurations
- Certificate generation and management
- Mock implementations for external services
The infrastructure directory contains all infrastructure-as-code configurations:
- Kubernetes manifests for SPIRE deployment
- Monitoring stack configurations
- Network policies and security configurations
- (Planned) Terraform configurations for cloud resources
- (Planned) Ansible playbooks for automation
The core directory contains the fundamental libraries and utilities:
- Identity management components
- Security utilities and helpers
- Integration components for various platforms
- Common utilities and shared code
The services directory contains the main service implementations:
- Identity provider service for workload identity
- Policy engine for access control
- Federation service for cross-domain trust
- Additional supporting services
The examples directory contains practical examples and templates:
- Kubernetes deployment examples
- Cloud provider integration examples
- Service mesh integration examples
- Common use case implementations
The system provides robust identity management capabilities through a distributed architecture that ensures secure and scalable workload identity provisioning. Each workload receives a unique SPIFFE ID that is cryptographically verifiable and tied to its runtime environment.
Special Considerations:
- Ephemeral workloads require identity at runtime—not via pre-provisioned secrets
- Trust is anchored in secure enclaves, TPMs, or cloud-native CAs
- The system is designed for rapid scaling and horizontal expansion
- Supports both X.509 certificates and JWTs
- Enables decentralized issuance with centralized governance
- Federation-ready: supports trust bundles and multi-domain identity
graph TD
A[Workload] -->|Attestation| B[SPIRE Agent]
B -->|Identity Request| C[SPIRE Server]
C -->|Validate| D[Node Attestation]
C -->|Issue| E[SPIFFE ID]
E -->|Bind| F[Workload Identity]
F -->|Use| G[Service Communication]
Key components:
- Automatic identity provisioning based on workload attributes
- Certificate-based authentication using X.509 certificates
- Role-based access control with fine-grained permissions
- Support for multiple identity providers through federation
The security architecture implements a zero-trust model where every service interaction requires mutual authentication and authorization. The system uses mTLS for secure communication and implements robust key management practices.
Security Features:
- End-to-end mTLS with SPIFFE-based identities
- Continuous credential rotation (5–15 minute TTLs)
- Just-in-time identity provisioning and revocation
- Optional confidential computing integration (e.g. Intel SGX, AMD SEV)
- Real-time attestation: validate integrity of code and runtime environment
Temporal Governance:
- Trust is not just who a workload is—but when it is valid
- Time-bound credentials ensure credentials expire quickly, reducing lateral movement risk
- Policies can enforce access windows and runtime conditions via engines like OPA
graph LR
A[Service A] -->|mTLS| B[Service B]
A -->|Verify| C[Certificate]
B -->|Verify| D[Certificate]
C -->|Validate| E[Trust Chain]
D -->|Validate| E
E -->|Check| F[Policy Engine]
Security features:
- Mutual TLS (mTLS) for all service-to-service communication
- Zero-trust model implementation with continuous verification
- Secure key management with automatic rotation
- Hardware security module (HSM) support for key storage
The system provides seamless integration with modern cloud-native platforms and tools, enabling organizations to implement workload identity across their entire infrastructure stack.
Integration capabilities:
- Kubernetes: Uses native ServiceAccount tokens for OIDC federation
- Cloud IAM: Compatible with AWS IRSA, GCP Workload Identity, Azure Federated Identity Credentials
- Service Mesh: Works with Istio, Linkerd, and custom mTLS setups
- CI/CD: Secure ephemeral identity for GitHub Actions, GitLab, Jenkins, and more
- Secrets Managers: Authenticate to systems like Vault using ambient SPIFFE identity
graph TD
A[Workload Identity] -->|Integrate| B[Kubernetes]
A -->|Connect| C[Service Mesh]
A -->|Federate| D[Cloud Providers]
A -->|Automate| E[CI/CD]
B -->|Use| F[Service Accounts]
C -->|Use| G[mTLS]
D -->|Use| H[Cloud IAM]
E -->|Use| I[Pipeline Security]
- Kubernetes cluster (v1.24+)
- Helm v3.7+
- kubectl
- SPIRE v1.13.3+ (see SPIRE releases)
- Access to container registry
- Review the Architecture Guide
- Follow the Deployment Guide
- Consult the Developer Guide
- Refer to the Security Best Practices
# Clone the repository
git clone https://github.com/your-org/workload-identity.git
cd workload-identity
# Set up development environment
make setup-dev
# Start local development cluster
make start-local-cluster
# Run tests
make testThe system is organized into several key components:
-
Identity Provider
- Located in
services/identity-provider/ - Handles workload identity issuance and validation
- Implements SPIFFE/SPIRE integration
- Manages trust relationships
- Located in
-
Policy Engine
- Located in
services/policy-engine/ - Evaluates access control policies
- Integrates with OPA for policy decisions
- Manages policy lifecycle
- Located in
-
Federation Service
- Located in
services/federation/ - Handles cross-domain trust
- Manages trust bundle exchange
- Implements federation protocols
- Located in
- Unit tests for core components
- Integration tests for service interactions
- End-to-end tests for complete workflows
- Security tests for trust relationships
- Performance tests for scalability
-
Build components:
make build
-
Run tests:
make test -
Build containers:
make docker-build
-
Deploy to cluster:
make deploy
- Use the monitoring stack for observability
- Check logs using the logging system
- Use tracing for request flows
- Monitor metrics for performance
- Create a feature branch
- Make your changes
- Run tests and linting
- Submit a pull request
- Address review comments
- Merge after approval
For more details, see the Developer Guide.
apiVersion: workload-identity/v1
kind: WorkloadIdentity
metadata:
name: my-service
spec:
serviceAccount: my-service-account
identityProvider: kubernetes
policies:
- name: service-access
rules:
- apiGroups: ["*"]
resources: ["*"]
verbs: ["get", "list"]The system supports migration from various identity systems:
- OIDC to Workload Identity
- SAML to Workload Identity
- Custom identity system migration
See the Migration Guide for detailed procedures.
Supports major compliance frameworks:
- ISO 27001
- SOC 2
- GDPR
- HIPAA
See the Compliance Guide for implementation details.
The system provides comprehensive monitoring and observability capabilities through a modern observability stack:
monitoring:
metrics:
prometheus:
enabled: true
retention: 15d
scrape_interval: 15s
logging:
loki:
enabled: true
retention: 30d
tracing:
jaeger:
enabled: true
sampling_rate: 0.1-
Identity Metrics
- Identity issuance rate
- Identity validation success/failure
- Certificate rotation events
- Trust bundle updates
-
Security Metrics
- Authentication attempts
- Authorization decisions
- Policy evaluation latency
- Security violations
-
Performance Metrics
- Request latency
- Error rates
- Resource utilization
- Cache hit rates
-
Federation Metrics
- Cross-domain trust operations
- Trust bundle exchange events
- Federation health status
- Federation latency
- Structured logging in JSON format
- Log levels: DEBUG, INFO, WARN, ERROR
- Log rotation and retention
- Log aggregation and analysis
- Distributed tracing with OpenTelemetry
- Request flow visualization
- Latency analysis
- Error tracking
-
System Overview
- System health
- Resource utilization
- Error rates
- Performance metrics
-
Security Dashboard
- Authentication metrics
- Authorization decisions
- Security events
- Policy evaluations
-
Federation Dashboard
- Trust relationships
- Federation health
- Cross-domain operations
- Trust bundle status
-
Critical Alerts
- System failures
- Security violations
- Trust chain issues
- Federation failures
-
Warning Alerts
- High latency
- Error rate spikes
- Resource constraints
- Certificate expiration
-
Info Alerts
- Configuration changes
- Policy updates
- Trust bundle updates
- Federation events
For detailed setup and configuration, see the Monitoring Guide.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
- SPIFFE/SPIRE community
- Open source contributors
- Security researchers
- Early adopters and testers
Workload identity is the foundation of runtime trust for autonomous systems. As software shifts from static services to AI agents that reason, delegate, and act across trust domains, the primitives in this repository become more important, not less.
The Agentic Identity Guide covers how to extend this architecture to cover AI agents today. The PQC Migration Guide covers preparing this infrastructure for the post-quantum transition NIST formalized in 2024.
These are not future problems. They are current engineering requirements.
core/identity/manager.go- Core identity management logiccore/security/crypto.go- Cryptographic operations and key managementcore/integration/kubernetes.go- Kubernetes integration implementationservices/identity-provider/main.go- Identity provider service implementationservices/policy-engine/evaluator.go- Policy evaluation engineinfrastructure/kubernetes/spire/server.yaml- SPIRE server configurationinfrastructure/kubernetes/spire/agent.yaml- SPIRE agent configuration
core/security/tls.go- TLS configuration and certificate managementcore/security/auth.go- Authentication and authorization logiccore/security/audit.go- Audit logging implementationinfrastructure/kubernetes/spire/secrets.yaml- Secret managementinfrastructure/kubernetes/spire/network-policy.yaml- Network security policies
infrastructure/kubernetes/spire/configmap.yaml- Core configurationinfrastructure/kubernetes/spire/trust-bundle.yaml- Trust domain configurationinfrastructure/kubernetes/spire/federation.yaml- Federation settingsinfrastructure/kubernetes/monitoring/prometheus.yaml- Monitoring configuration
- Architecture Guide - System design and component interactions
- Security Best Practices - Security implementation details
- PKI Guide - Certificate management and PKI concepts
- API Reference - API design and implementation
- Deployment Guide - Deployment architecture and procedures
- Monitoring Guide - Observability and monitoring
- Disaster Recovery Guide - Recovery procedures
- Compliance Guide - Compliance implementation
- Zero Trust principles implementation
- Cryptographic implementation review
- Authentication and authorization flows
- Secret management approach
- Network security controls
- Audit logging implementation
- Horizontal scaling capabilities
- Performance bottlenecks
- Resource utilization
- Caching strategies
- Rate limiting implementation
- High availability design
- Fault tolerance mechanisms
- Disaster recovery procedures
- Backup and restore capabilities
- State management approach
- Kubernetes integration
- Service mesh compatibility
- Cloud provider integration
- External identity providers
- Monitoring and logging systems
- Deployment strategies
- Configuration management
- Monitoring and alerting
- Logging and tracing
- Maintenance procedures
- Review cryptographic implementations for best practices
- Verify secure key management and rotation
- Check authentication and authorization flows
- Validate audit logging and monitoring
- Assess network security controls
- Evaluate component interactions
- Review scalability considerations
- Check fault tolerance mechanisms
- Verify high availability design
- Assess integration patterns
- Review deployment procedures
- Check monitoring implementation
- Verify backup and recovery
- Assess maintenance procedures
- Validate configuration management
- Review security standards compliance
- Check audit requirements
- Verify documentation completeness
- Assess policy implementation
- Validate control effectiveness
- Update architecture documentation
- Enhance security guidelines
- Improve operational procedures
- Add troubleshooting guides
- Update compliance documentation
- Address security findings
- Enhance scalability features
- Improve monitoring capabilities
- Strengthen disaster recovery
- Optimize performance
- Improve deployment procedures
- Enhance monitoring setup
- Strengthen backup procedures
- Optimize maintenance tasks
- Update security controls